Query with a sub query that requires multiple values - sql

I can't really think of a title so let me explain the problem:
Problem: I want to return an array of Posts with each Post containing a Like Count. The Like Count is for a specific post but for all users who have liked it
For example:
const posts = [
{
post_id: 1,
like_count: 100
},
{
post_id: 2,
like_count: 50
}
]
Now with my current solution, I don't think it's possible but here is what I have so far.
My query currently looks like this (produced by TypeORM):
SELECT
"p"."uid" AS "p_uid",
"p"."created_at" AS "post_created_at",
"l"."uid" AS "like_uid",
"l"."post_liked" AS "post_liked",
"ph"."path" AS "path",
"ph"."title" AS "photo_title",
"u"."name" AS "post_author",
(
SELECT
COUNT(like_id) AS "like_count"
FROM
"likes" "l"
INNER JOIN
"posts" "p"
ON "p"."post_id" = "l"."post_id"
WHERE
"l"."post_liked" = true
AND l.post_id = $1
)
AS "like_count"
FROM
"posts" "p"
LEFT JOIN
"likes" "l"
ON "l"."post_id" = "p"."post_id"
INNER JOIN
"photos" "ph"
ON "ph"."photo_id" = "p"."photo_id"
INNER JOIN
"users" "u"
ON "u"."user_id" = "p"."user_id"
At $1 is where the post.post_id should go (but for the sake of testing I stuck the first post's id in there), assuming I have an array of post_ids ready to put in there.
My TypeORM query looks like this
async findAll(): Promise<Post[]> {
return await getRepository(Post)
.createQueryBuilder('p')
.select(['p.uid'])
.addSelect(subQuery =>
subQuery
.select('COUNT(like_id)', 'like_count')
.from(Like, 'l')
.innerJoin('l.post', 'p')
.where('l.post_liked = true AND l.post_id = :post_id', {post_id: 'a16f0c3e-5aa0-4cf8-82da-dfe27d3f991a'}), 'like_count'
)
.addSelect('p.created_at', 'post_created_at')
.addSelect('u.name', 'post_author')
.addSelect('l.uid', 'like_uid')
.addSelect('l.post_liked', 'post_liked')
.addSelect('ph.title', 'photo_title')
.addSelect('ph.path', 'path')
.leftJoin('p.likes', 'l')
.innerJoin('p.photo', 'ph')
.innerJoin('p.user', 'u')
.getRawMany()
}
Why am I doing this? What I am trying to avoid is calling count for every single post on my page to return the number of likes for each post. I thought I could somehow do this in a subquery but now I am not sure if it's possible.
Can someone suggest a more efficient way of doing something like this? Or is this approach completely wrong?

I find working with ORMs terrible and cannot help you with this. But the query itself has flaws:
You want one row per post, but you are joining likes, thus getting one row per post and like.
Your subquery is not related to your main query. It should instead relate to the main query's post.
The corrected query:
SELECT
p.uid,
p.created_at,
ph.path AS photo_path,
ph.title AS photo_title,
u.name AS post_author,
(
SELECT COUNT(*)
FROM likes l
WHERE l.post_id = p.post_id
AND l.post_liked = true
) AS like_count
FROM posts p
JOIN photos ph ON ph.photo_id = p.photo_id
JOIN users u ON u.user_id = p.user_id
ORDER BY p.uid;
I suppose it's quite easy for you to convert this to TypeORM. There is nothing wrong with counting for every single post, by the way. It is even necessary to get the result you are after.
The subquery could also be moved to the FROM clause using GROUP BY l.post_id within. As is, you are getting all posts, regardless of them having likes or not. By moving the subquery to the FROM clause, you could instead decide between INNER JOIN and LEFT OUTER JOIN.
The query would benefit from the following index:
CREATE INDEX idx ON likes (post_id, post_liked);
Provide this index, if the query seems too slow.

Related

What .merge() is doing in this query?

I'm a little confused about how to interpret this query, all because of merge situation, even after reading the documentation.
I would like to know what is the corresponding SQL query for the below
Analise.joins(dape: [empresa: :area_atuacao])
.merge(#dapes)
.where(analises: { atual: true })
.pluck('analises.img')
Output from calling to_sql on this query:
=> "SELECT \"analises\".*
FROM \"analises\"
INNER JOIN \"dapes\" ON \"dapes\".\"id\" = \"analises\".\"dape_id\"
INNER JOIN \"empresas\" ON \"empresas\".\"id\" = \"dapes\".\"empresa_id\"
INNER JOIN \"areas_atuacao\" ON \"areas_atuacao\".\"id\" = \"empresas\".\"area_atuacao_id\"
WHERE \"analises\".\"atual\" = 't'"
merge(other)
Merges in the conditions from other, if other is an ActiveRecord::Relation.
- Rails API Docs
A common example is using it for example to merge search conditions together:
#cities = City.all
#cities = #cities.merge(City.where(country: params[:country])) if params[:country]
#cities = #cities.merge(City.where(name: params[:name])) if params[:name]
You can also use it to create conditions on joined tables like in this example from the docs:
Post.where(published: true)
.joins(:comments)
.merge( Comment.where(spam: false) )
This creates the same query as:
Post.where(published: true)
.joins(:comments)
.where(comments: { spam: false })
The exact query in you example depends on the scope defined in the instance variable #dapes. But judging from the SQL generated .merge(#dapes) seems to do nothing. This could be the case if #dapes = Dape.all for example.
Merging a condition with no where clause does nothing:
irb(main):003:0> User.merge(User.all)
User Load (0.6ms) SELECT "users".* FROM "users" LIMIT ? [["LIMIT", 11]]
=> #<ActiveRecord::Relation []>
irb(main):004:0>
Here is your formatted better for a better readability. The merge is used to transfer over your conditions so that nothing gets overriden.
SELECT analises.*
FROM analises
INNER JOIN dapes ON dapes.id = analises.dape_id
INNER JOIN empresas ON empresas.id = dapes.empresa_id
INNER JOIN areas_atuacao ON areas_atuacao.id = empresas.area_atuacao_id
WHERE analises.atual = 't'
It seems .merge() is used when you are joining tables to be more specific of what exactly you are joining.
In this case you are .merge(#dapes) seems to be merging your tables on all values of #dapes.
One way to get a better understanding of what impact .merge(#dapes) is having on your query is to run the to_sql command again how the sql has changed.
Footnote
I took the sql that was generated from the first to_sql you ran and entered it into the Scuttle Editor and got the following rails commands. I don't know if this helps but I just thought it was food for thought!
Analise.select(Analise.arel_table[Arel.star]).where(Analise.arel_table[:atual].eq('t')).joins(
Analise.arel_table.join(Dape.arel_table).on(
Dape.arel_table[:id].eq(Analise.arel_table[:dape_id])
).join_sources
).joins(
Analise.arel_table.join(Empresa.arel_table).on(
Empresa.arel_table[:id].eq(Dape.arel_table[:empresa_id])
).join_sources
).joins(
Analise.arel_table.join(AreasAtuacao.arel_table).on(
AreasAtuacao.arel_table[:id].eq(Empresa.arel_table[:area_atuacao_id])
).join_sources
)

Using LATERAL joins in Ecto v2.0

I'm trying to join the latest comment on a post record, like so:
comment = from c in Comment, order_by: [desc: c.inserted_at], limit: 1
post = Repo.all(
from p in Post,
where: p.id == 123,
join: c in subquery(comment), on: c.post_id == p.id,
select: [p.title, c.body],
limit: 1
)
Which generates this SQL:
SELECT p0."title",
c1."body"
FROM "posts" AS p0
INNER JOIN (SELECT p0."id",
p0."body",
p0."inserted_at",
p0."updated_at"
FROM "comments" AS p0
ORDER BY p0."inserted_at" DESC
LIMIT 1) AS c1
ON c1."post_id" = p0."id"
WHERE ( p0."id" = 123 )
LIMIT 1
It just returns nil. If I remove the on: c.post_id == p.id it'll return data, but obviously it'll return the lastest comment for all posts, not the post in question.
What am I doing wrong? A fix could be to use a LATERAL join subquery, but I can't figure out whether it's possible to pass the p reference into a subquery.
Thanks!
The issue was caused by the limit: 1 here:
comment = from c in Comment, order_by: [desc: c.inserted_at], limit: 1
Since the resulting query was SELECT * FROM "comments" AS p0 ORDER BY p0."inserted_at" DESC LIMIT 1, it was only returning the most recent comment on ANY post, not the post I was querying against.
FYI the query was >150ms with ~200,000 comment rows, but that was brought down to ~12ms with a simple index:
create index(:comments, ["(inserted_at::date) DESC"])
It's worth noting that while this query works in returning the post in question and only the most recent comment, it'll actually return $number_of_comments rows if you remove the limit: 1. So say if you wanted to retrieve all 100 posts in your database with the most recent comment of each, and you had 200,000 comments in the database, this query would return 200,000 rows. Instead you should use a LATERAL join as discussed below.
.
Update
Unfortunately ecto doesn't support LATERAL joins right now.
An ecto fragment would work great here, however the join query wraps the fragment in additional parentheses (i.e. INNER JOIN (LATERAL (SELECT …))), which isn't valid SQL, so you'd have to use raw SQL for now:
sql = """
SELECT p."title",
c."body"
FROM "posts" AS p
INNER JOIN LATERAL (SELECT c."id",
c."body",
c."inserted_at"
FROM "comments" AS c
WHERE ( c."post_id" = p."id" )
ORDER BY c."inserted_at" DESC
LIMIT 1) AS c
ON true
WHERE ( p."id" = 123 )
LIMIT 1
"""
res = Ecto.Adapters.SQL.query!(Repo, sql, [])
This query returns in <1ms on the same database.
Note this doesn't return your Ecto model struct, just the raw response from Postgrex.

postgres: filter out some results with join

I’m trying to construct SQL query (I’m running postgres) which would filter out some posts:
Lets say, that I’ve fot User model(id, name, and so on...), Blacklist model(with id, user_id, blacklisted_user_id) nad Post model(id, author_id, title, and so on).
Imagine that user A(id=5) blocks user B(id=10).
Neither user A nor B should see their posts.
I’m trying with query which looks sth. like this:
SELECT posts.* FROM "posts"
LEFT JOIN blacklists b ON (b.user_id = posts.author_id OR
b.blacklisted_user_id = posts.author_id)
WHERE (b.user_id = 5 AND b.blacklisted_user_id = posts.author_id) OR
(b.user_id = posts.author_id AND b.blacklisted_user_id = 5)
However, result is exactly opposite that I need: I’m getting only posts from blacklisted user.
When I use b.user_id != 5, I’m getting empty response.
You're repeating the blacklisted_user_id=posts.author_id which is unnecessary.
Then, you presumably want posts that don't match the blacklist. Something like:
SELECT posts.* FROM posts
LEFT JOIN blacklists b ON (b.user_id = posts.author_id OR
b.blacklisted_user_id = posts.author_id)
WHERE posts.author_id = 5 AND b.user_id IS NULL
Is that the sort of thing you wanted?

How to get Django QuerySet 'exclude' to work right?

I have a database that contains schemas for skus, kits, kit_contents, and checklists. Here is a query for "Give me all the SKUs defined for kitcontent records defined for kit records defined in checklist 1":
SELECT DISTINCT s.* FROM skus s
JOIN kit_contents kc ON kc.sku_id = s.id
JOIN kits k ON k.id = kc.kit_id
JOIN checklists c ON k.checklist_id = 1;
I'm using Django, and I mostly really like the ORM because I can express that query by:
skus = SKU.objects.filter(kitcontent__kit__checklist_id=1).distinct()
which is such a slick way to navigate all those foreign keys. Django's ORM produces basically the same as the SQL written above. The trouble is that it's not clear to me how to get all the SKUs not defined for checklist 1. In the SQL query above, I'd do this by replacing the "=" with "!=". But Django's models don't have a not equals operator. You're supposed to use the exclude() method, which one might guess would look like this:
skus = SKU.objects.filter().exclude(kitcontent__kit__checklist_id=1).distinct()
but Django produces this query, which isn't the same thing:
SELECT distinct s.* FROM skus s
WHERE NOT ((skus.id IN
(SELECT kc.sku_id FROM kit_contents kc
INNER JOIN kits k ON (kc.kit_id = k.id)
WHERE (k.checklist_id = 1 AND kc.sku_id IS NOT NULL))
AND skus.id IS NOT NULL))
(I've cleaned up the query for easier reading and comparison.)
I'm a beginner to the Django ORM, and I'd like to use it when possible. Is there a way to get what I want here?
EDIT:
karthikr gave an answer that doesn't work for the same reason the original ORM .exclude() solution doesn't work: a SKU can be in kit_contents in kits that exist on both checklist_id=1 and checklist_id=2. Using the by-hand query I opened my post with, using "checklist_id = 1" produces 34 results, using "checklist_id = 2" produces 53 results, and the following query produces 26 results:
SELECT DISTINCT s.* FROM skus s
JOIN kit_contents kc ON kc.sku_id = s.id
JOIN kits k ON k.id = kc.kit_id
JOIN checklists c ON k.checklist_id = 1
JOIN kit_contents kc2 ON kc2.sku_id = s.id
JOIN kits k2 ON k2.id = kc2.kit_id
JOIN checklists c2 ON k2.checklist_id = 2;
I think this is one reason why people don't seem to find the .exclude() solution a reasonable replacement for some kind of not_equals filter -- the latter allows you to say, succinctly, exactly what you mean. Presumably the former could also allow the query to be expressed, but I increasingly despair of such a solution being simple.
You could do this - get all the objects for checklist 1, and exclude it from the complete list.
sku_ids = skus.values_list('pk', flat=True)
non_checklist_1 = SKU.objects.exclude(pk__in=sku_ids).distinct()

Joining tables, counting, and group to return a Model

So I've got a SQL query I'd like to duplicate in rails:
select g.*
from gamebox_favorites f
inner join gameboxes g on f.gamebox_id = g.id
group by f.gamebox_id
order by count(f.gamebox_id) desc;
I've been reading over the rails Active Record Query Interface site, but can't quite seem to put this together. I'd like the query to return a collection of Gamebox records, sorted by the number of 'favorites' a gamebox has. What is the cleanest way to do this in rails?
I believe this will work (works on a similarly structured database locally), though I'm not sure I have the proper models in the proper spots for what you're trying to do, so you might need to move a coule things around:
Gamebox.joins(:gamebox_favorites).
group('"gamebox_favorites"."gamebox_id"').
order('count("gamebox_favorites"."gamebox_id")')
On the console, this should compile to (in the case of PostgreSQL on the back end):
SELECT "gameboxes".* FROM "gamebox_favorites"
INNER JOIN "gamebox_favorites"
ON "gamebox_favorites"."gamebox_id" = "gamebox"."id"
GROUP BY "gamebox_favorites"."gamebox_id"
ORDER BY count("gamebox_favorites"."gamebox_id")
...and I'm guessing that you don't want do just wrap it in a find_by_sql call, such as:
Gamebox.find_by_sql("select g.* from gamebox_favorites f
inner join gameboxes g
on f.gamebox_id = g.id
group by f.gamebox_id
order by count(f.gamebox_id) desc")