Using LATERAL joins in Ecto v2.0

Using LATERAL joins in Ecto v2.0 - sql

I'm trying to join the latest comment on a post record, like so:
comment = from c in Comment, order_by: [desc: c.inserted_at], limit: 1
post = Repo.all(
from p in Post,
where: p.id == 123,
join: c in subquery(comment), on: c.post_id == p.id,
select: [p.title, c.body],
limit: 1
)
Which generates this SQL:
SELECT p0."title",
c1."body"
FROM "posts" AS p0
INNER JOIN (SELECT p0."id",
p0."body",
p0."inserted_at",
p0."updated_at"
FROM "comments" AS p0
ORDER BY p0."inserted_at" DESC
LIMIT 1) AS c1
ON c1."post_id" = p0."id"
WHERE ( p0."id" = 123 )
LIMIT 1
It just returns nil. If I remove the on: c.post_id == p.id it'll return data, but obviously it'll return the lastest comment for all posts, not the post in question.
What am I doing wrong? A fix could be to use a LATERAL join subquery, but I can't figure out whether it's possible to pass the p reference into a subquery.
Thanks!

The issue was caused by the limit: 1 here:
comment = from c in Comment, order_by: [desc: c.inserted_at], limit: 1
Since the resulting query was SELECT * FROM "comments" AS p0 ORDER BY p0."inserted_at" DESC LIMIT 1, it was only returning the most recent comment on ANY post, not the post I was querying against.
FYI the query was >150ms with ~200,000 comment rows, but that was brought down to ~12ms with a simple index:
create index(:comments, ["(inserted_at::date) DESC"])
It's worth noting that while this query works in returning the post in question and only the most recent comment, it'll actually return $number_of_comments rows if you remove the limit: 1. So say if you wanted to retrieve all 100 posts in your database with the most recent comment of each, and you had 200,000 comments in the database, this query would return 200,000 rows. Instead you should use a LATERAL join as discussed below.
.
Update
Unfortunately ecto doesn't support LATERAL joins right now.
An ecto fragment would work great here, however the join query wraps the fragment in additional parentheses (i.e. INNER JOIN (LATERAL (SELECT …))), which isn't valid SQL, so you'd have to use raw SQL for now:
sql = """
SELECT p."title",
c."body"
FROM "posts" AS p
INNER JOIN LATERAL (SELECT c."id",
c."body",
c."inserted_at"
FROM "comments" AS c
WHERE ( c."post_id" = p."id" )
ORDER BY c."inserted_at" DESC
LIMIT 1) AS c
ON true
WHERE ( p."id" = 123 )
LIMIT 1
"""
res = Ecto.Adapters.SQL.query!(Repo, sql, [])
This query returns in <1ms on the same database.
Note this doesn't return your Ecto model struct, just the raw response from Postgrex.

Related

Query with a sub query that requires multiple values

I can't really think of a title so let me explain the problem:
Problem: I want to return an array of Posts with each Post containing a Like Count. The Like Count is for a specific post but for all users who have liked it
For example:
const posts = [
{
post_id: 1,
like_count: 100
},
{
post_id: 2,
like_count: 50
}
]
Now with my current solution, I don't think it's possible but here is what I have so far.
My query currently looks like this (produced by TypeORM):
SELECT
"p"."uid" AS "p_uid",
"p"."created_at" AS "post_created_at",
"l"."uid" AS "like_uid",
"l"."post_liked" AS "post_liked",
"ph"."path" AS "path",
"ph"."title" AS "photo_title",
"u"."name" AS "post_author",
(
SELECT
COUNT(like_id) AS "like_count"
FROM
"likes" "l"
INNER JOIN
"posts" "p"
ON "p"."post_id" = "l"."post_id"
WHERE
"l"."post_liked" = true
AND l.post_id = $1
)
AS "like_count"
FROM
"posts" "p"
LEFT JOIN
"likes" "l"
ON "l"."post_id" = "p"."post_id"
INNER JOIN
"photos" "ph"
ON "ph"."photo_id" = "p"."photo_id"
INNER JOIN
"users" "u"
ON "u"."user_id" = "p"."user_id"
At $1 is where the post.post_id should go (but for the sake of testing I stuck the first post's id in there), assuming I have an array of post_ids ready to put in there.
My TypeORM query looks like this
async findAll(): Promise<Post[]> {
return await getRepository(Post)
.createQueryBuilder('p')
.select(['p.uid'])
.addSelect(subQuery =>
subQuery
.select('COUNT(like_id)', 'like_count')
.from(Like, 'l')
.innerJoin('l.post', 'p')
.where('l.post_liked = true AND l.post_id = :post_id', {post_id: 'a16f0c3e-5aa0-4cf8-82da-dfe27d3f991a'}), 'like_count'
)
.addSelect('p.created_at', 'post_created_at')
.addSelect('u.name', 'post_author')
.addSelect('l.uid', 'like_uid')
.addSelect('l.post_liked', 'post_liked')
.addSelect('ph.title', 'photo_title')
.addSelect('ph.path', 'path')
.leftJoin('p.likes', 'l')
.innerJoin('p.photo', 'ph')
.innerJoin('p.user', 'u')
.getRawMany()
}
Why am I doing this? What I am trying to avoid is calling count for every single post on my page to return the number of likes for each post. I thought I could somehow do this in a subquery but now I am not sure if it's possible.
Can someone suggest a more efficient way of doing something like this? Or is this approach completely wrong?

I find working with ORMs terrible and cannot help you with this. But the query itself has flaws:
You want one row per post, but you are joining likes, thus getting one row per post and like.
Your subquery is not related to your main query. It should instead relate to the main query's post.
The corrected query:
SELECT
p.uid,
p.created_at,
ph.path AS photo_path,
ph.title AS photo_title,
u.name AS post_author,
(
SELECT COUNT(*)
FROM likes l
WHERE l.post_id = p.post_id
AND l.post_liked = true
) AS like_count
FROM posts p
JOIN photos ph ON ph.photo_id = p.photo_id
JOIN users u ON u.user_id = p.user_id
ORDER BY p.uid;
I suppose it's quite easy for you to convert this to TypeORM. There is nothing wrong with counting for every single post, by the way. It is even necessary to get the result you are after.
The subquery could also be moved to the FROM clause using GROUP BY l.post_id within. As is, you are getting all posts, regardless of them having likes or not. By moving the subquery to the FROM clause, you could instead decide between INNER JOIN and LEFT OUTER JOIN.
The query would benefit from the following index:
CREATE INDEX idx ON likes (post_id, post_liked);
Provide this index, if the query seems too slow.

Select a field called "return" in postgreSQL

I'm having a problem with a query in postgres, the table cgporders_items has a field called return, I cannot get actual result of that field with this query, it returns me al ceros.
SELECT "Cgporder".id AS "Cgporder__id"
,"Sale".preorder_number AS "Sale__preorder_number"
,"Contact".id AS "Contact__id"
,"Contact".NAME AS "Contact__name"
,"Ptype".NAME AS "Ptype__name"
,(
SELECT code
FROM products
WHERE id = "CgporderItem".parent_id
) AS "Product__parent_code"
,"Product".id AS "Product__id"
,"Product".code AS "Product__code"
,"Product".NAME AS "Product__name"
,"CgporderItem".quantity AS "CgporderItem__quantity"
,"CgporderItem".return AS "CgporderItem__return"
,"CgporderItem".cep_id AS "CgporderItem__cep"
FROM cgporders AS "Cgporder"
INNER JOIN contacts AS "Contact" ON ("Contact".id = "Cgporder".contact_id)
INNER JOIN cgporders_items AS "CgporderItem" ON ("Cgporder".id = "CgporderItem".cgporder_id)
INNER JOIN products AS "Product" ON ("Product".id = "CgporderItem".product_id)
INNER JOIN ptypes AS "Ptype" ON ("Ptype".id = "Product".ptype_id)
LEFT JOIN cgporders_sales AS "CgporderSale" ON ("Cgporder".id = "CgporderSale".cgporder_id)
LEFT JOIN sales AS "Sale" ON ("Sale".id = "CgporderSale".sale_id)
WHERE "CgporderItem".parent_id != 0
AND "Cgporder"."issue_date" >= '2015-11-27'
AND "Cgporder"."issue_date" <= '2015-11-27'
AND "Cgporder"."status" = 'confirmed'
ORDER BY "Ptype".NAME
,"Product"."code";
There are actually a lots of rows that matches the select condition, but it return cero on "CgporderItem".return AS "CgporderItem__return"
If I make a simple query like select "return" from cgporders_items it works. But in this query it does not work.
Can you help me please?

"return" is a reserved word in SQL, but not in Postgres. See the list here. The following code works find in Postgres (SQL Fiddle is here):
create table dum (return int);
select dum.return from dum;
Your problem is something else. If I had to guess, the where clause is too restrictive (the condition on dates is a bit suspect).

Select Count of one table into another

I have one SQL statement as:
SELECT ARTICLES.NEWS_ARTCL_ID, ARTICLES.NEWS_ARTCL_TTL_DES,
ARTICLES.NEWS_ARTCL_CNTNT_T, ARTICLES.NEWS_ARTCL_PUB_DT,
ARTICLES.NEWS_ARTCL_AUTH_NM, ARTICLES.NEWS_ARTCL_URL, ARTICLES.MEDIA_URL,
ARTICLES.ARTCL_SRC_ID, SOURCES.ARTCL_SRC_NM, MEDIA.MEDIA_TYPE_DESCRIP
FROM
RSKLMOBILEB2E.NEWS_ARTICLE ARTICLES,
RSKLMOBILEB2E.MEDIA_TYPE MEDIA,
RSKLMOBILEB2E.ARTICLE_SOURCE SOURCES
WHERE ARTICLES.MEDIA_TYPE_IDENTIF = MEDIA.MEDIA_TYPE_IDENTIF
AND ARTICLES.ARTCL_SRC_ID = SOURCES.ARTCL_SRC_ID
AND ARTICLES.ARTCL_SRC_ID = 1
ORDER BY ARTICLES.NEWS_ARTCL_PUB_DT
Now I need to combine another SQL statement into one which is:
SELECT COUNT ( * )
FROM RSKLMOBILEB2E.NEWS_LIKES LIKES, RSKLMOBILEB2E.NEWS_ARTICLE ARTICLES
WHERE LIKES.NEWS_ARTCL_ID = ARTICLES.NEWS_ARTCL_ID
Basically I have one table which contains articles and I need to include the user likes which is in another table.

Use a subquery to add the likescount in your first query like this:
SELECT ARTICLES.NEWS_ARTCL_ID
,ARTICLES.NEWS_ARTCL_TTL_DES
,ARTICLES.NEWS_ARTCL_CNTNT_T
,ARTICLES.NEWS_ARTCL_PUB_DT
,ARTICLES.NEWS_ARTCL_AUTH_NM
,ARTICLES.NEWS_ARTCL_URL
,ARTICLES.MEDIA_URL
,ARTICLES.ARTCL_SRC_ID
,SOURCES.ARTCL_SRC_NM
,MEDIA.MEDIA_TYPE_DESCRIP
,(
SELECT COUNT(*)
FROM RSKLMOBILEB2E.NEWS_LIKES LIKES
WHERE LIKES.NEWS_ARTCL_ID = ARTICLES.NEWS_ARTCL_ID
) AS LikesCount
FROM RSKLMOBILEB2E.NEWS_ARTICLE ARTICLES
,RSKLMOBILEB2E.MEDIA_TYPE MEDIA
,RSKLMOBILEB2E.ARTICLE_SOURCE SOURCES
WHERE ARTICLES.MEDIA_TYPE_IDENTIF = MEDIA.MEDIA_TYPE_IDENTIF
AND ARTICLES.ARTCL_SRC_ID = SOURCES.ARTCL_SRC_ID
AND ARTICLES.ARTCL_SRC_ID = 1
ORDER BY ARTICLES.NEWS_ARTCL_PUB_DT;

I'm not sure what you are trying to achieve but it seems you want to count all the data from 2 tables. You can edit your query to something like this.
SELECT COUNT (ARTICLES.*) FROM RSKLMOBILEB2E.NEWS_LIKES LIKES
JOIN RSKLMOBILEB2E.NEWS_ARTICLE ARTICLES
ON LIKES.NEWS_ARTCL_ID = ARTICLES.NEWS_ARTCL_ID

I think that solution is in using Analytic Functions. Please have a look on https://oracle-base.com/articles/misc/analytic-functions
Please check following query (keep in mind I have no idea about your table structures). Due to left join records might be duplicated, this is why grouping is added.
SELECT ARTICLES.NEWS_ARTCL_ID, ARTICLES.NEWS_ARTCL_TTL_DES,
ARTICLES.NEWS_ARTCL_CNTNT_T, ARTICLES.NEWS_ARTCL_PUB_DT,
ARTICLES.NEWS_ARTCL_AUTH_NM, ARTICLES.NEWS_ARTCL_URL, ARTICLES.MEDIA_URL,
ARTICLES.ARTCL_SRC_ID, SOURCES.ARTCL_SRC_NM, MEDIA.MEDIA_TYPE_DESCRIP,
count(LIKES.ID) over ( partition by ARTICLES.NEWS_ARTCL_ID ) as num_likes
FROM RSKLMOBILEB2E.NEWS_ARTICLE ARTICLES
join RSKLMOBILEB2E.MEDIA_TYPE MEDIA
on ARTICLES.MEDIA_TYPE_IDENTIF = MEDIA.MEDIA_TYPE_IDENTIF
join RSKLMOBILEB2E.ARTICLE_SOURCE SOURCES
on ARTICLES.ARTCL_SRC_ID = SOURCES.ARTCL_SRC_ID
LEFT JOIN RSKLMOBILEB2E.NEWS_LIKES LIKES
ON LIKES.NEWS_ARTCL_ID = ARTICLES.NEWS_ARTCL_ID
WHERE
ARTICLES.ARTCL_SRC_ID = 1
group by ARTICLES.NEWS_ARTCL_ID, ARTICLES.NEWS_ARTCL_TTL_DES,
ARTICLES.NEWS_ARTCL_CNTNT_T, ARTICLES.NEWS_ARTCL_PUB_DT,
ARTICLES.NEWS_ARTCL_AUTH_NM, ARTICLES.NEWS_ARTCL_URL, ARTICLES.MEDIA_URL,
ARTICLES.ARTCL_SRC_ID, SOURCES.ARTCL_SRC_NM, MEDIA.MEDIA_TYPE_DESCRIP
ORDER BY ARTICLES.NEWS_ARTCL_PUB_DT
I also changed coma-separated list of tables from where condition to joins. I think this is more readable since table join conditions are separated from result filtering in where clause.

how to load 2 related datasets together? (i.e posts and comments)

I'm fairly new to pg and trying to figure out what the best approach is to loading a set of posts and their associated comments together.
For example:
I'm trying to fetch a 10 posts and comments associated with all those posts, like facebooks wall where you see a feed of posts and comments loaded on the same page. My Schema looks something like this:
Posts
--------
id - author - description - date - commentCount
Comments
-------
id - post_id - author - description - date
I tried to load both posts and comments on the same postgres function doing the follow:
select *
from posts
LEFT join comments on posts.id = comments.post_id
unfortunately it duplicated the posts N times where comment exists, where N is the number of comments a post has. However, the first solution is that I can always filter it out in Node after fetching the data
Also when I try to use group by posts.id (to make it easier to traverse in node) I get the following error:
column "comments.id" must appear in the GROUP BY clause or be used in an aggregate function
The second thing I can try is to send an array of post_ids I want to load and have pg_function load and send them back, but I can't quite the query right:
CREATE OR REPLACE FUNCTION "getPosts"(postIds int[])
RETURNS text AS
$BODY$
BEGIN
RETURN (
SELECT *
FROM Comments
WHERE Comments.id = postIds[0]
);
END;$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
to call it:
SELECT n FROM "public"."getPosts"(array[38]) As n;
However, even when trying to get value from one index I get the following error:
ERROR: subquery must return only one column
LINE 1: SELECT (
^
QUERY: SELECT (
SELECT *
FROM Comments
WHERE Comments.id = 38
)
Finally, the last solution is to simple make N seperate calls of postgres, where N is the number of posts with comments, so if I have 5 posts with comments I make 5 calls to postgres with post_id and select from Comments table.
I'm really not sure what to do here, any help would be appreciated.
Thanks

To have all comments as an array of records for each post:
select
p.id, p.title, p.content, p.author,
array_agg(c) as comments
from
posts p
left join
comments c on p.id = c.post_id
group by 1, 2, 3, 4
Or one array for each comment column:
select
p.id, p.title, p.content, p.author,
array_agg(c.author) as comment_author,
array_agg(c.content) as comment_content
from
posts p
left join
comments c on p.id = c.post_id
group by 1, 2, 3, 4

Timeout running SQL query

I'm trying to using the aggregation features of the django ORM to run a query on a MSSQL 2008R2 database, but I keep getting a timeout error. The query (generated by django) which fails is below. I've tried running it directs the SQL management studio and it works, but takes 3.5 min
It does look it's aggregating over a bunch of fields which it doesn't need to, but I wouldn't have though that should really cause it to take that long. The database isn't that big either, auth_user has 9 records, ticket_ticket has 1210, and ticket_watchers has 1876. Is there something I'm missing?
SELECT
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined],
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id],
[auth_user].[password],
[auth_user].[last_login],
[auth_user].[is_superuser],
[auth_user].[username],
[auth_user].[first_name],
[auth_user].[last_name],
[auth_user].[email],
[auth_user].[is_staff],
[auth_user].[is_active],
[auth_user].[date_joined]
HAVING
(COUNT([tickets_ticket].[id]) > 0 OR COUNT(T3.[id]) > 0 )
EDIT:
Here are the relevant indexes (excluding those not used in the query):
auth_user.id (PK)
auth_user.username (Unique)
tickets_ticket.id (PK)
tickets_ticket.capturer_id
tickets_ticket.responsible_id
tickets_ticket_watchers.id (PK)
tickets_ticket_watchers.user_id
tickets_ticket_watchers.ticket_id
EDIT 2:
After a bit of experimentation, I've found that the following query is the smallest that results in the slow execution:
SELECT
COUNT([tickets_ticket].[id]) AS [tickets_captured__count],
COUNT(T3.[id]) AS [assigned_tickets__count],
COUNT([tickets_ticket_watchers].[ticket_id]) AS [tickets_watched__count]
FROM
[auth_user]
LEFT OUTER JOIN [tickets_ticket] ON ([auth_user].[id] = [tickets_ticket].[capturer_id])
LEFT OUTER JOIN [tickets_ticket] T3 ON ([auth_user].[id] = T3.[responsible_id])
LEFT OUTER JOIN [tickets_ticket_watchers] ON ([auth_user].[id] = [tickets_ticket_watchers].[user_id])
GROUP BY
[auth_user].[id]
The weird thing is that if I comment out any two lines in the above, it runs in less that 1s, but it doesn't seem to matter which lines I remove (although obviously I can't remove a join without also removing the relevant SELECT line).
EDIT 3:
The python code which generated this is:
User.objects.annotate(
Count('tickets_captured'),
Count('assigned_tickets'),
Count('tickets_watched')
)
A look at the execution plan shows that SQL Server is first doing a cross-join on all the table, resulting in about 280 million rows, and 6Gb of data. I assume that this is where the problem lies, but why is it happening?

SQL Server is doing exactly what it was asked to do. Unfortunately, Django is not generating the right query for what you want. It looks like you need to count distinct, instead of just count: Django annotate() multiple times causes wrong answers
As for why the query works that way: The query says to join the four tables together. So say an author has 2 captured tickets, 3 assigned tickets, and 4 watched tickets, the join will return 2*3*4 tickets, one for each combination of tickets. The distinct part will remove all the duplicates.

what about this?
SELECT auth_user.*,
C1.tickets_captured__count
C2.assigned_tickets__count
C3.tickets_watched__count
FROM
auth_user
LEFT JOIN
( SELECT capturer_id, COUNT(*) AS tickets_captured__count
FROM tickets_ticket GROUP BY capturer_id ) AS C1 ON auth_user.id = C1.capturer_id
LEFT JOIN
( SELECT responsible_id, COUNT(*) AS assigned_tickets__count
FROM tickets_ticket GROUP BY responsible_id ) AS C2 ON auth_user.id = C2.responsible_id
LEFT JOIN
( SELECT user_id, COUNT(*) AS tickets_watched__count
FROM tickets_ticket_watchers GROUP BY user_id ) AS C3 ON auth_user.id = C3.user_id
WHERE C1.tickets_captured__count > 0 OR C2.assigned_tickets__count > 0
--WHERE C1.tickets_captured__count is not null OR C2.assigned_tickets__count is not null -- also works (I think with beter performance)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Using LATERAL joins in Ecto v2.0 - sql

Related

Query with a sub query that requires multiple values

Select a field called "return" in postgreSQL

Select Count of one table into another

how to load 2 related datasets together? (i.e posts and comments)

Timeout running SQL query

Categories

Resources