Sequelize raw include/join - sql

I want to be able to do the following simple SQL query using Sequelize:
SELECT * FROM one
JOIN (SELECT COUNT(*) AS count, two_id FROM two GROUP BY two_id) AS table_two
ON one.two_id = two.two_id
I can't seem to find anything about raw include, or raw model
For performance reason, I don't want subselect in the main query (which I know sequelize already works well with) aka:
SELECT * FROM one, (SELECT COUNT(*) AS count FROM two WHERE one.two_id = two.two_id) AS count
Regarding the following sequelize code (models One and Two exists)
models.One.findAll({
include: [
models: model.Two
// what to add here in order to get the example SQL
]
})

Seems like I found a somewhat hacky workaround:
You can use fn inside selections to use any SQL word (like JOIN), resulting in something like this for my use case:
models.One.findAll({
attributes: [
fn('JOIN', literal('SELECT COUNT(*) AS count FROM two WHERE one.two_id = two.two_id')),
],
});
Note you can do that only on the last attribute (else it's a misplaced joint)

Related

Filter a Postgres Table based on multiple columns

I'm working on an shopping website. User selects multiple filters on and sends the request to backend which is in node.js and using postgres as DB.
So I want to search the required data in a single query.
I have a json object containing all the filters that user selected. I want to use them in postgres query and return to user the obtained results.
I have a postgres Table that contains a few products.
name Category Price
------------------------------
LOTR Books 50
Harry Potter Books 30
Iphone13 Mobile 1000
SJ8 Cameras 200
I want to filter the table using n number of filters in a single query.
I have to make it work for multiple filters such as the ones mentioned below. So I don't have to write multiple queries for different filters.
{ category: 'Books', price: '50' }
{ category: 'Books' }
{category : ['Books', 'Mobiles']}
I can query the table using
SELECT * FROM products WHERE category='Books' AND 'price'='100'
SELECT * FROM products WHERE category='Books'
SELECT * FROM products WHERE category='Books' OR category='Mobiles'
respectively.
But I want to write my query in such a way that it populates the Keys and Values dynamically. So I may not have to write separate query for every filter.
I have obtained the key and value pairs from the request.query and saved them
const params = req.query;
const keys: string = Object.keys(params).join(",")
const values: string[] = Object.values(params)
const indices = Object.keys(params).map((obj, i) => {
return "$" + (i + 1)
})
But I'm unable to pass them in the query in a correct manner.
Does anybody have a suggestion for me? I'd highly appreciate any help.
Thank you in advance.
This is not the way you filter data from a SQL database table.
You need to use the NodeJS pg driver to connect to the database, then write a SQL query. I recommend prepared statements.
A query would look like:
SELECT * FROM my_table WHERE price < ...
At least based on your question, to me, it is unclear why would want to do these manipulations in JavaScript, nor what you want to be accomplished really.

Hasura GraphQL query : where clause with value from related entity

I recently started using GraphQL through a Hasura layer on top of a PostgreSQL DB. I am having troubles implementing a basic query.
Here are my entities :
article :
articleId
content
publishedDate
categoryId
category :
categoryId
name
createdDate
What I am trying to achieve, in English, is the following : get the articles that were published in the first 3 days following the creation of their related category.
In pseudo-SQL, I would do something similar to this :
SELECT Article.content, Category.name
FROM Article
INNER JOIN Category ON Article.categoryId = Category.categoryId
WHERE Article.publishedDate < Category.createdDate + 3 days
Now as a GraphQL query, I tried something similar to this :
query MyQuery {
articles(where: {publishedDate: {_lt: category.createdDate + 3 days}}) {
content
category {
name
}
}
}
Unfortunately, it does not recognise the “category.createdDate” in the where clause. I tried multiple variations, including aliases, with no success.
What am I missing ?
To my understanding of the Hasura docs, there is no way to reference a field within a query like you can do in SQL. But that does not mean, you can't do, what you are trying to do. There are three ways of achieving the result that you want:
1. Create a filtered view
CREATE VIEW earliest_articles AS
SELECT Article.*
FROM Article
INNER JOIN Category ON Article.categoryId = Category.categoryId
WHERE Article.publishedDate < Category.createdDate + 3 days
This view should now become available as a query. Docs for views are here.
2. Create a view with a new field
CREATE VIEW articles_with_creation_span AS
SELECT
Article.*,
(Article.publishedDate - Category.createdDate) AS since_category_creation
FROM Article
INNER JOIN Category ON Article.categoryId = Category.categoryId
Now you can again query this view and use a filter query on the extra field. This solution is useful if you want to vary the amount of time, that you want to query for. The downside is, that there are now two different article types, it might make sense to hide the regular articles query then.
3. Use a computed field
You can also define computed fields in Hasura. These fields are not only included in the output object type of the corresponding GraphQL type, but they can also be used for querying. Refer to the docs on how to create a computed field. Here you can again, calculate the difference and then use some kind of comparison operator (<) to check if this time is smaller than '3 days'.

Best practice for scaling SQL queries on joins?

I'm writing a REST api that works with SQL and am constantly finding myself in similar situations to this one, where I need to return lists of objects with nested lists inside each object by querying over table joins.
Let's say I have a many-to-many relationship between Users and Groups. I have a User table and a Group table and a junction table UserGroup between them. Now I want to write a REST endpoint that returns a list of users, and for each user the groups that they are enrolled in. I want to return a json with a format like this:
[
{
"username": "test_user1",
<other attributes ...>
"groups": [
{
"group_id": 2,
<other attributes ...>
},
{
"group_id": 3,
<other attributes ...>
}
]
},
{
"username": "test_user2",
<other attributes ...>
"groups": [
{
"group_id": 1,
<other attributes ...>
},
{
"group_id": 2,
<other attributes ...>
}
]
},
etc ...
There are two or three ways to query SQL for this that I can think of:
Issue a variable number of SQL queries: Query for a list of Users, then loop over each user to query over the junction linkage to populate the groups list for each user. The number of SQL queries linearly increases with the number of users returned.
example (using python flask_sqlalchemy / flask_restx):
users = db.session.query(User).filter( ... )
for u in users:
groups = db.session.query(Group).join(UserGroup, UserGroup.group_id == Group.id) \
.filter(UserGroup.user.id == u.id)
retobj = api.marshal([{**u.__dict__, 'groups': groups} for u in users], my_model)
# Total number of queries: 1 + number of users in result
Issue a constant number of SQL queries: This can be done by issuing one monolithic SQL query performing all joins with potentially lots of redundant data in the User's columns, or, often more preferably, a few separate SQL queries. For example, query for a list of Users, then query the Group table joining on GroupUsers, then manually group groups in server code.
example code:
from collections import defaultdict
users = db.session.query(User).filter( ... )
uids = [u.id for u in users]
groups = db.session.query(User.user_id, Group).join(UserGroup, UserGroup.group_id == Group.id) \
.filter(UserGroup.user_id._in(uids))
aggregate = defaultdict(list)
for g in groups:
aggregate[g.user_id].append(g[1].__dict__)
retobj = api.marshal([{**u.__dict__, 'groups': aggregate[u.id]} for u in users], my_model)
# Total number of queries: 2
The third approach, with limited usefulness, is to use string_agg or a similar approach to force SQL to concatenate a grouping into one string column, then unpack the string into a list server-side, for example if all I want was the group number I could use string_agg and group_by to get back "1,2" in one query to the User table. But this is only useful if you don't need complex objects.
I'm attracted to the second approach because I feel like it's more efficient and scalable because the number of SQL queries (which I have assumed is the main bottleneck for no particularly good reason) is constant, but it takes some more work on the server's side to filter all the groups into each user. But I thought part of the point of using SQL is to take advantage of its efficient sorting/filtering so you don't have to do it yourself.
So my question is, am I right in thinking that it's a good idea to make the number of SQL queries constant at the expense of more server-side processing and dev time? Is it a waste of time to try to whittle down the number of unnecessary SQL queries? Will I regret it if I don't, when API is tested at scale? Is there a better way to solve this problem that I'm not aware of?
Using joinedload option you can load all the data with just one query:
q = (
session.query(User)
.options(db.joinedload(User.groups))
.order_by(User.id)
)
users = q.all()
for user in users:
print(user.name)
for ug in user.groups:
print(" ", ug.name)
When you run the query above, all the groups would have been loaded already from the database using the query similar to below:
SELECT "user".id,
"user".name,
group_1.id,
group_1.name
FROM "user"
LEFT OUTER JOIN (user_group AS user_group_1
JOIN "group" AS group_1 ON group_1.id = user_group_1.group_id)
ON "user".id = user_group_1.user_id
And now you only need to serialize the result with proper schema.

Aggregate multiple columns without groupBy in Slick 2.0

I would like to perform an aggregation with Slick that executes SQL like the following:
SELECT MIN(a), MAX(a) FROM table_a;
where table_a has an INT column a
In Slick given the table definition:
class A(tag: Tag) extends Table[Int](tag, "table_a") {
def a = column[Int]("a")
def * = a
}
val A = TableQuery[A]
val as = A.map(_.a)
It seems like I have 2 options:
Write something like: Query(as.min, as.max)
Write something like:
as
.groupBy(_ => 1)
.map { case (_, as) => (as.map(identity).min, as.map(identity).max) }
However, the generated sql is not good in either case. In 1, there are two separate sub-selects generated, which is like writing two separate queries. In 2, the following is generated:
select min(x2."a"), max(x2."a") from "table_a" x2 group by 1
However, this syntax is not correct for Postgres (it groups by the first column value, which is invalid in this case). Indeed AFAIK it is not possible to group by a constant value in Postgres, except by omitting the group by clause.
Is there a way to cause Slick to emit a single query with both aggregates without the GROUP BY?
The syntax error is a bug. I created a ticket: https://github.com/slick/slick/issues/630
The subqueries are a limitation of Slick's SQL compiler currently producing non-optimal code in this case. We are working on improving the situation.
As a workaround, here is a pattern to swap out the generated SQL under the hood and leave everything else intact: https://gist.github.com/cvogt/8054159
I use the following trick in SQL Server, and it seems to work in Postgres:
select min(x2."a"), max(x2."a")
from "table_a" x2
group by (case when x2.a = x2.a then 1 else 1 end);
The use of the variable in the group by expression tricks the compiler into thinking that there could be more than one group.

Filtering simultaneously on count of related objects and on count of related objects that satisfy a condition in Django

So I have models amounting to this (very simplified, obviously):
class Mystery(models.Model):
name = models.CharField(max_length=100)
class Character(models.Model):
mystery = models.ForeignKey(Mystery, related_name="characters")
required = models.BooleanField(default=True)
Basically, in each mystery there are a number of characters, which can be essential to the story or not. The minimum number of actors that can stage a mystery is the number of required characters for that mystery; the maximum number is the number of characters total for the mystery.
Now I'm trying to query for mysteries that can be played by some given number of actors. It seemed straightforward enough using the way Django's filtering and annotation features function; after all, both of these queries work fine:
# Returns mystery objects with at least x characters in all
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).filter(max_actors__gte=x)
# Returns mystery objects with no more than x required characters
Mystery.objects.filter(characters__required=True).annotate(min_actors=Count('characters', distinct=True)).filter(min_actors__lte=x)
However, when I try to combine the two...
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).filter(characters__required=True).annotate(min_actors=Count('characters', distinct=True)).filter(min_actors__lte=x, max_actors__gte=x)
...it doesn't work. Both min_actors and max_actors come out containing the maximum number of actors. The relevant parts of the actual query being run look like this:
SELECT `mysteries_mystery`.`id`,
`mysteries_mystery`.`name`,
COUNT(DISTINCT `mysteries_character`.`id`) AS `max_actors`,
COUNT(DISTINCT `mysteries_character`.`id`) AS `min_actors`
FROM `mysteries_mystery`
LEFT OUTER JOIN `mysteries_character` ON (`mysteries_mystery`.`id` = `mysteries_character`.`mystery_id`)
INNER JOIN `mysteries_character` T5 ON (`mysteries_mystery`.`id` = T5.`mystery_id`)
WHERE T5.`required` = True
GROUP BY `mysteries_mystery`.`id`, `mysteries_mystery`.`name`
...which makes it clear that while Django is creating a second join on the character table just fine (the second copy of the table being aliased to T5), that table isn't actually being used anywhere and both of the counts are being selected from the non-aliased version, which obviously yields the same result both times.
Even when I try to use an extra clause to select from T5, I get told there is no such table as T5, even as examining the output query shows that it's still aliasing the second character table to T5. Another attempt to do this with extra clauses went like this:
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).extra(select={'min_actors': "SELECT COUNT(*) FROM mysteries_character WHERE required = True AND mystery_id = mysteries_mystery.id"}).extra(where=["`min_actors` <= %s", "`max_actors` >= %s"], params=[x, x])
But that didn't work because I can't use a calculated field in the WHERE clause, at least on MySQL. If only I could use HAVING, but alas, Django's .extra() does not and will never allow you to set HAVING parameters.
Is there any way to get Django's ORM to do what I want?
How about combining your Count()s:
Mystery.objects.annotate(max_actors=Count('characters', distinct=True),min_actors=Count('characters', distinct=True)).filter(characters__required=True).filter(min_actors__lte=x, max_actors__gte=x)
This seems to work for me but I didn't test it with your exact models.
It's been a couple of weeks with no suggested solutions, so here's how I ended up going about it, for anyone else who might be looking for an answer:
Mystery.objects.annotate(max_actors=Count('characters', distinct=True)).filter(max_actors__gte=x, id__in=Mystery.objects.filter(characters__required=True).annotate(min_actors=Count('characters', distinct=True)).filter(min_actors__lte=x).values('id'))
In other words, filter on the first count and on IDs that match those in an explicit subquery that filters on the second count. Kind of clunky, but it works well enough for my purposes.