Histograms with ActiveRecord

Histograms with ActiveRecord - sql

I have two entity types in a one-to-many association:
A --1:*--> B
I would like to obtain a histogram of counts of b per each a in ActiveRecord.
Something like:
A.id | count(B)
1 | 20
2 | 32
3 | 332
4 | 0
[ {:id=>1, :count=>20},{:id=>2,:count=>32}, ... ]
I could do it directly in mySql but I was wondering the proper way of doing it on ActiveRecord.

As far as I know, it's always going to be a bit of a mix and match of AR and SQL.
Let's say you want to count comments for each post. The following code:
Post.joins(:comments).group("posts.id").count("comments.id")
will produce something like:
{2=>304, 3=>329, 4=>46, 6=>342}
where the hash keys are the post ids. Notice, however, that because of the inner join enforced by joins you will only get posts with existing comments. But, just like in your example, it makes sense to also list post ids that have zero comments.
You might then think to include, rather than joining comments:
Post.includes(:comments).group("posts.id").count("comments.id")
But that won't work. For reasons I am not completely sure about, the includes is ignored by AR, which results in and a database error.
In the end, I resorted to the following much more explicit and sql-ish query:
Post.select("posts.id, COUNT(comments.id) AS comm_count")
.joins("LEFT OUTER JOIN comments ON posts.id=comments.post_id")
.group("posts.id")
This will return an array of actual Post models (not just :id => count hashes) with the added attribute comm_count, which is what you need.
Of course you may add other post attributes to the select, such as the title, the content, etc.

Related

Hasura GraphQL query : where clause with value from related entity

I recently started using GraphQL through a Hasura layer on top of a PostgreSQL DB. I am having troubles implementing a basic query.
Here are my entities :
article :
articleId
content
publishedDate
categoryId
category :
categoryId
name
createdDate
What I am trying to achieve, in English, is the following : get the articles that were published in the first 3 days following the creation of their related category.
In pseudo-SQL, I would do something similar to this :
SELECT Article.content, Category.name
FROM Article
INNER JOIN Category ON Article.categoryId = Category.categoryId
WHERE Article.publishedDate < Category.createdDate + 3 days
Now as a GraphQL query, I tried something similar to this :
query MyQuery {
articles(where: {publishedDate: {_lt: category.createdDate + 3 days}}) {
content
category {
name
}
}
}
Unfortunately, it does not recognise the “category.createdDate” in the where clause. I tried multiple variations, including aliases, with no success.
What am I missing ?

To my understanding of the Hasura docs, there is no way to reference a field within a query like you can do in SQL. But that does not mean, you can't do, what you are trying to do. There are three ways of achieving the result that you want:
1. Create a filtered view
CREATE VIEW earliest_articles AS
SELECT Article.*
FROM Article
INNER JOIN Category ON Article.categoryId = Category.categoryId
WHERE Article.publishedDate < Category.createdDate + 3 days
This view should now become available as a query. Docs for views are here.
2. Create a view with a new field
CREATE VIEW articles_with_creation_span AS
SELECT
Article.*,
(Article.publishedDate - Category.createdDate) AS since_category_creation
FROM Article
INNER JOIN Category ON Article.categoryId = Category.categoryId
Now you can again query this view and use a filter query on the extra field. This solution is useful if you want to vary the amount of time, that you want to query for. The downside is, that there are now two different article types, it might make sense to hide the regular articles query then.
3. Use a computed field
You can also define computed fields in Hasura. These fields are not only included in the output object type of the corresponding GraphQL type, but they can also be used for querying. Refer to the docs on how to create a computed field. Here you can again, calculate the difference and then use some kind of comparison operator (<) to check if this time is smaller than '3 days'.

Filtering simultaneously on count of related objects and on count of related objects that satisfy a condition in Django

So I have models amounting to this (very simplified, obviously):
class Mystery(models.Model):
name = models.CharField(max_length=100)
class Character(models.Model):
mystery = models.ForeignKey(Mystery, related_name="characters")
required = models.BooleanField(default=True)
Basically, in each mystery there are a number of characters, which can be essential to the story or not. The minimum number of actors that can stage a mystery is the number of required characters for that mystery; the maximum number is the number of characters total for the mystery.
Now I'm trying to query for mysteries that can be played by some given number of actors. It seemed straightforward enough using the way Django's filtering and annotation features function; after all, both of these queries work fine:
# Returns mystery objects with at least x characters in all
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).filter(max_actors__gte=x)
# Returns mystery objects with no more than x required characters
Mystery.objects.filter(characters__required=True).annotate(min_actors=Count('characters', distinct=True)).filter(min_actors__lte=x)
However, when I try to combine the two...
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).filter(characters__required=True).annotate(min_actors=Count('characters', distinct=True)).filter(min_actors__lte=x, max_actors__gte=x)
...it doesn't work. Both min_actors and max_actors come out containing the maximum number of actors. The relevant parts of the actual query being run look like this:
SELECT `mysteries_mystery`.`id`,
`mysteries_mystery`.`name`,
COUNT(DISTINCT `mysteries_character`.`id`) AS `max_actors`,
COUNT(DISTINCT `mysteries_character`.`id`) AS `min_actors`
FROM `mysteries_mystery`
LEFT OUTER JOIN `mysteries_character` ON (`mysteries_mystery`.`id` = `mysteries_character`.`mystery_id`)
INNER JOIN `mysteries_character` T5 ON (`mysteries_mystery`.`id` = T5.`mystery_id`)
WHERE T5.`required` = True
GROUP BY `mysteries_mystery`.`id`, `mysteries_mystery`.`name`
...which makes it clear that while Django is creating a second join on the character table just fine (the second copy of the table being aliased to T5), that table isn't actually being used anywhere and both of the counts are being selected from the non-aliased version, which obviously yields the same result both times.
Even when I try to use an extra clause to select from T5, I get told there is no such table as T5, even as examining the output query shows that it's still aliasing the second character table to T5. Another attempt to do this with extra clauses went like this:
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).extra(select={'min_actors': "SELECT COUNT(*) FROM mysteries_character WHERE required = True AND mystery_id = mysteries_mystery.id"}).extra(where=["`min_actors` <= %s", "`max_actors` >= %s"], params=[x, x])
But that didn't work because I can't use a calculated field in the WHERE clause, at least on MySQL. If only I could use HAVING, but alas, Django's .extra() does not and will never allow you to set HAVING parameters.
Is there any way to get Django's ORM to do what I want?

How about combining your Count()s:
Mystery.objects.annotate(max_actors=Count('characters', distinct=True),min_actors=Count('characters', distinct=True)).filter(characters__required=True).filter(min_actors__lte=x, max_actors__gte=x)
This seems to work for me but I didn't test it with your exact models.

It's been a couple of weeks with no suggested solutions, so here's how I ended up going about it, for anyone else who might be looking for an answer:
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).filter(max_actors__gte=x, id__in=Mystery.objects.filter(characters__required=True).annotate(min_actors=Count('characters', distinct=True)).filter(min_actors__lte=x).values('id'))
In other words, filter on the first count and on IDs that match those in an explicit subquery that filters on the second count. Kind of clunky, but it works well enough for my purposes.

SQL Multiple IN statements on one column

Okay, I'm using WordPress, but this pertains to the SQL side.
I have a query in which I need to filter out posts using three different categories, but they're all terms in the post.
For example:
In my three categories, I select the following: (Academia,Webdevelopment) (Fulltime,Parttime) (Earlycareer).
Now what I want to do is make sure when I query that the post has AT LEAST ONE of each of those terms.
CORRECT RESULT: A post with tags Academia, Fulltime, Earlycareer
INCORRECT RESULT: A post with tags Academia, Earlycareer (doesn't have fulltime or parttime)
Currently, my query looks something like this:
SELECT * FROM $wpdb->posts WHERE
(
$wpdb->terms.slug IN (list of selected from category 1) AND
$wpdb->terms.slug IN (list of selected from category 2) AND
$wpdb->terms.slug IN (list of selected from category 3)
)
AND $wpdb->term_taxonomy.taxonomy = 'jobtype' AND .......
When using this query, it returns no results when I select across the different categories (that is, I can choose 4 things from category 1 and it has results, but I can't choose anything from category 2 or 3. And vice versa)
I'm not sure if this is something to do with using IN more than once on the same column.
Thanks in advance for any help!

Your query seems to be correct. There is no any limitations in SQL about using IN for the same column miltimple times.
But ensure that you don't have any NULL values in your list of selected from category 1/2/3 queries. Even single NULL value in these lists will give NULL as a result of whole 'WHERE' condition and you will get nothing as a result.
If this won't help then it must be WordPress issue.

Query for restricting associated entities

I would like to form a query where an associated collection has been
restricted, ideally with Hibernate Criteria or HQL, but I'd be
interested in how to do this in SQL. For example, say I have a Boy
class with a bidirectional one-to-many association to the Kites class.
I want to get a List of the Boys whose kites' lengths are in a range.
The problem is that the HQL/Criteria I know only gets me Boy objects
with a complete (unrestricted) Set of Kites, as given in these two
examples (the HQL is a guess). I.e., I get the Boys who have Kites
in the right range, but for each such Boy I get all of the Kites, not
just the ones in the range.
select new Boy(name) from Boy b
inner join Kite on Boy.id=Kite.boyId
where b.name = "Huck" and length >= 1;
Criteria crit = session.createCriteria(Boy.class);
crit.add(Restrictions.eq("name", "Huck"))
.createCriteria("kites")
.add(Restrictions.ge("length", new BigDecimal(1.0)));
List list = crit.list();
Right now the only way I have to get the correct Kite length Sets is
to iterate through the list of Boys and for each one re-query Kites
for the ones in the range. I'm hoping some SQL/HQL/Criteria wizard
knows a better way. I'd prefer to get a Criteria solution because my
real "Boy" constructor has quite a few arguments and it would be handy
to have the initialized Boys.
My underlying database is MySQL. Do not assume that I know much about
SQL or Hibernate. Thanks!

I'm no hibernate expert, but as you say you're interested in the SQL solution as well...:
In SQL, I assume you mean something like (with the addition of indices, keys, etc):
CREATE TABLE Boys (Id INT, Name VARCHAR(16))
CREATE TABLE Kites(Length FLOAT, BoyID INT, Description TEXT)
plus of course other columns &c that don't matter here.
All boys owning 1+ kites with lenghts between 1.0 and 1.5:
SELECT DISTINCT Boys.*
FROM Boys
JOIN Kites ON(Kites.BoyID=Boys.ID AND Kites.Length BETWEEN 1.0 AND 1.5)
If you also want to see the relevant kites' description, with N rows per boy owning N such kites:
SELECT Boys.*, Kites.Length, Kites.Description
FROM Boys
JOIN Kites ON(Kites.BoyID=Boys.ID AND Kites.Length BETWEEN 1.0 AND 1.5)
Hope somebody can help you integrate these with hybernate...!

It turns out that this is best done by reversing the join:
Criteria crit = session.createCriteria(Kite.class);
crit.add(Restrictions.ge("length", new BigDecimal(1.0))
.createCriteria("boy")
.add(Restrictions.eq("name", "Huck")));
List<Kite> list = crit.list();
Note that the list's Kites need to be aggregated into Boys, this
can be done easily with a HashMap.

LINQ exclusion

Is there a direct LINQ syntax for finding the members of set A that are absent from set B? In SQL I would write this
SELECT A.* FROM A LEFT JOIN B ON A.ID = B.ID WHERE B.ID IS NULL

See the MSDN documentation on the Except operator.

var results = from itemA in A
where !B.Any(itemB => itemB.Id == itemA.Id)
select itemA;

I believe your LINQ would be something like the following.
var items = A.Except(
from itemA in A
from itemB in B
where itemA.ID == itemB.ID
select itemA);
Update
As indicated by Maslow in the comments, this may well not be the most performant query. As with any code, it is important to carry out some level of profiling to remove bottlenecks and inefficient algorithms. In this case, chaowman's answer provides a better performing result.
The reasons can be seen with a little examination of the queries. In the example I provided, there are at least two loops over the A collection - 1 to combine the A and B list, and the other to perform the Except operation - whereas in chaowman's answer (reproduced below), the A collection is only iterated once.
// chaowman's solution only iterates A once and partially iterates B
var results = from itemA in A
where !B.Any(itemB => itemB.Id == itemA.Id)
select itemA;
Also, in my answer, the B collection is iterated in its entirety for every item in A, whereas in chaowman's answer, it is only iterated upto the point at which a match is found.
As you can see, even before looking at the SQL generated, you can spot potential performance issues just from the query itself. Thanks again to Maslow for highlighting this.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Histograms with ActiveRecord - sql

Related

Hasura GraphQL query : where clause with value from related entity

Filtering simultaneously on count of related objects and on count of related objects that satisfy a condition in Django

SQL Multiple IN statements on one column

Query for restricting associated entities

LINQ exclusion

Categories

Resources