Postgres LATERAL Query Correctness and Efficiency - sql

I have the following structure of data, with table names give in bold font and their pertinent column names below.
common_authorprofile:
{id, full_name, description, avatar_id, profile_id}
aldryn_people_person table:
{id, phone, ...}
aldryn_newsblog_article:
{id, is_published, is_featured, ..., author_id}
It bears noting that common_authorprofile.profile_id = aldryn_people_person.id and aldryn_newsblog_article.author_id = aldryn_people_person.id
I am trying to compute the number of articles for each entity in common_authorprofile.
This is how it is currently done:
SELECT main.*, sub.article_count
FROM common_authorprofile AS main
INNER JOIN aldryn_people_person
ON aldryn_people_person.id = main.profile_id,
LATERAL
(SELECT author_id, COUNT(*) as article_count
FROM aldryn_newsblog_article AS sub
WHERE
sub.author_id = aldryn_people_person.id AND
sub.app_config_id = 1 AND
sub.is_published IS TRUE AND
sub.publishing_date <= now() AND
aldryn_people_person.id = sub.author_id
GROUP BY author_id
) AS sub
My question is two-fold:
is this a correct way of doing it, given the table relationship?
is this an efficient way, i.e., is there a way to improve its speed and readability?

Dropping aldryn_people_person out of the mix makes this easier to read.
I also prefer common table expressions over subqueries or lateral joins for readability, but CTEs can slow down execution. I refactor only if speed is a problem.
I would approach it like this:
with article_counts as (
select author_id, count(*) as article_count
from aldryn_newsblog_article
where app_config_id = 1
and is_published
and publishing_date <= now()
group by author_id
)
select prof.*, coalesce(ac.article_count, 0) as article_count
from common_authorprofile prof
left join article_counts ac
on ac.author_id = prof.profile_id;
The left outer join buys you the retrieval of all common_authorprofile records. The coalesce() displays missing rows from the article_counts CTE as 0. You can change the left join to just join if that is not what you want.
If you have any questions, please comment.

Related

How to join three tables having relation parent-child-child's child. And I want to access all records related to parent

I have three tables:
articles(id,title,message)
comments(id,article_id,commentedUser_id,comment)
comment_likes(id, likedUser_id, comment_id, action_like, action_dislike)
I want to show comments.id, comments.commentedUser_id, comments.comment, ( Select count(action_like) where action_like="like") as likes and comment_id=comments.id where comments.article_id=article.id
Actually I want to count all action_likes that related to any comment. And show all all comments of articles.
action_likes having only two values null or like
SELECT c.id , c.CommentedUser_id , c.comment , (cl.COUNT(action_like) WHERE action_like='like' AND comment_id='c.id') as likes
FROM comment_likes as cl
LEFT JOIN comments as c ON c.id=cl.comment_id
WHERE c.article_id=article.id
It shows nothing, I know I'm doing wrong way, that was just that I want say
I guess you are looking for something like below. This will return Article/Comment wise LIKE count.
SELECT
a.id article_id,
c.id comment_id,
c.CommentedUser_id ,
c.comment ,
COUNT (CASE WHEN action_like='like' THEN 1 ELSE NULL END) as likes
FROM article a
INNER JOIN comments C ON a.id = c.article_id
LEFT JOIN comment_likes as cl ON c.id=cl.comment_id
GROUP BY a.id,c.id , c.CommentedUser_id , c.comment
IF you need results for specific Article, you can add WHERE clause before the GROUP BY section like - WHERE a.id = N
I would recommend a correlated subquery for this:
SELECT a.id as article_id, c.id as comment_id,
c.CommentedUser_id, c.comment,
(SELECT COUNT(*)
FROM comment_likes cl
WHERE cl.comment_id = c.id AND
cl.action_like = 'like'
) as num_likes
FROM article a INNER JOIN
comments c
ON a.id = c.article_id;
This is a case where a correlated subquery often has noticeably better performance than an outer aggregation, particularly with the right index. The index you want is on comment_likes(comment_id, action_like).
Why is the performance better? Most databases will implement the group by by sorting the data. Sorting is an expensive operation that grows super-linearly -- that is, twice as much data takes more than twice as long to sort.
The correlated subquery breaks the problem down into smaller pieces. In fact, no sorting should be necessary -- just scanning the index and counting the matching rows.

SQL: how to count entries in multiple tables by value?

I have two tables, question & field. I need to count entries , with coincidental value of template_id (both tables contains).
Please advice, how to do it?
select count(q.*)
from question q
left join field f on f.template.id = q.template_id
In StackOverflow one should show ones own attempt, show that some effort was done.
Above inner join is probably what you meant. Try first select q.*, f.*.
SELECT
COUNT(*) AS TotalRecords
FROM question q
INNER JOIN field f ON f.template_id = q.template_id
If you want the count of distinct template_id in the two tables, use JOIN and COUNT(DISTINCT):
select count(distinct q.template_id)
from question q join
field f
on f.template_id = q.template_id;
If you use count(*) you will get a count of matching rows, not template_ids, so duplicates will affect the result.
If template_id is known to be unique in one of the tables (say question), then exists is probably more efficient:
select count(*)
from question q
where exists (select 1
from field f
where f.template_id = q.template_id
);

Where clause applied to only one column in join

I'm having some trouble with writing a certain SQL query. I have a wallet and a balance which I do join. The query now looks like that:
SELECT
`balances`.`id` AS `id`,
FROM
`wallet`
LEFT JOIN `balances` ON
( `wallet`.`currency` = `balances`.`currency` )
WHERE
`balances`.`user_id` = '181'
Because of the where clause, the query returns just matching records. I want to get all records from wallets table and only those from balances which do match where clause... hope I explained it well enough!
Cheers!
use subquery
SELECT w.*,t.*
FROM
wallet w
LEFT JOIN ( select * from balances where user_id = 181
) t ON w.currency =t.currency
Issue is you are applying filter on left join table wallets.
use below query.
SELECT
`balances`.`id` AS `id`,
FROM
`wallet`
LEFT JOIN (select * from `balances` `user_id` = '181') ON
( `wallet`.`currency` = `balances`.`currency` );
The question is not fully clear, but you almost definitely need an extra join clause on some sort of ID. Now there is no way to match a wallet with its balance(s). Assuming that balance have eg. a wallet_id, you'll want something like:
SELECT
`balances`.`id` AS `id`,
FROM
`wallet`
LEFT JOIN `balances` ON
(`wallet`.`id` = `balance`.`wallet_id` )
WHERE
`balances`.`user_id` = '181'
Move the condition to the ON clause. Don't use subqueries!
SELECT w.*, b.id
FROM wallet w LEFT JOIN
balances b
ON w.currency = b.currency AND
b.user_id = 181;
Notes:
The subquery in the FROM can impede the optimizer.
If you are using a LEFT JOIN, you should be selecting columns from the first table.
I am guessing that user_id is a number, so I removed the quotes around the comparison value.
Table aliases make the query easier to write and to read.
Backticks make the query harder to write and harder to read.

Counting empty relations from a SQL table

I'm trying to count authors who don't have any articles in our system, which aggregates authorship across sites. I've got a query working, but it isn't performant.
The best query I have thus far is this:
select count(*) as count_all
from (
select authors.id
from authors
left outer join site_authors on site_authors.author_id = authors.id
left outer join articles on articles.site_author_id = site_authors.id
group by authors.id
having count(articles.id) = 0
) a;
However, the subquery is rather inefficient. I was hoping there's a way to flatten this. I have several similar queries that add extra conditions on the left outer joins, so adding a count column to my schema isn't really an option here.
Extra rub: this is a cross-platform query and needs to work against both pgSQL, SQLite, and MySQL.
you can try a little bit different query, but I'm not sure that it will be faster:
select count(*)
from authors as a
where not exists (
select b.id
from site_authors as b
inner join
articles as c
on a.id=b.author_id and b.id=c.site_author_id)
of course I suppose you have proper indexes on tables:
site_authors: unique (author_id, id)
articles: non unique (site_author_id)
Assuming that 'normal' joins are simpler and faster, you could subtract the number of authors with articles from the total number of authors:
SELECT (SELECT COUNT(*)
FROM authors) -
(SELECT COUNT(DISTINCT site_authors.author_id)
FROM site_authors
JOIN articles ON articles.site_author_id = site_authors.id)
Alternatively, try a subquery:
SELECT COUNT(*)
FROM authors
WHERE id NOT IN (SELECT site_authors.author_id
FROM site_authors
JOIN articles ON articles.site_author_id = site_authors.id)
It might be simpler and faster to use NOT IN rather than a join. Sql processors are pretty smart about using indexes even when it looks obtuse. Something like this:
Select count(*)
from authors
where id not in (select author_id from site_authors)
and id not in (select site_author_id from articles);
Be sure that author_id and site_author_id are indexed. The optimizer will notice what your are doing and create an indexed look up for the "NOT IN" clause.

Converting a nested sql where-in pattern to joins

I have a query that is returning the correct data to me, but being a developer rather than a DBA I'm wondering if there is any reason to convert it to joins rather than nested selects and if so, what it would look like.
My code currently is
select * from adjustments where store_id in (
select id from stores where original_id = (
select original_id from stores where name ='abcd'))
Any references to the better use of joins would be appreciated too.
Besides any likely performance improvements, I find following much easier to read.
SELECT *
FROM adjustments a
INNER JOIN stores s ON s.id = a.store_id
INNER JOIN stores s2 ON s2.original_id = s.original_id
WHERE s.name = 'abcd'
Test script showing my original fault in ommitting original_id
DECLARE #Adjustments TABLE (store_id INTEGER)
DECLARE #Stores TABLE (id INTEGER, name VARCHAR(32), original_id INTEGER)
INSERT INTO #Adjustments VALUES (1), (2), (3)
INSERT INTO #Stores VALUES (1, 'abcd', 1), (2, '2', 1), (3, '3', 1)
/*
OP's Original statement returns store_id's 1, 2 & 3
due to original_id being all the same
*/
SELECT * FROM #Adjustments WHERE store_id IN (
SELECT id FROM #Stores WHERE original_id = (
SELECT original_id FROM #Stores WHERE name ='abcd'))
/*
Faulty first attempt with removing original_id from the equation
only returns store_id 1
*/
SELECT a.store_id
FROM #Adjustments a
INNER JOIN #Stores s ON s.id = a.store_id
WHERE s.name = 'abcd'
If you would use joins, it would look like this:
select *
from adjustments
inner join stores on stores.id = adjustments.store_id
inner join stores as stores2 on stores2.original_id = stores.original_id
where stores2.name = 'abcd'
(Apparently you can omit the second SELECT on the stores table (I left it out of my query) because if I'm interpreting your table structure correctly,
select id from stores where original_id = (select original_id from stores where name ='abcd')
is the same as
select * from stores where name ='abcd'.)
--> edited my query back to the original form, thanks to Lieven for pointing out my mistake in his answer!
I prefer using joins, but for simple queries like that, there is normally no performance difference. SQL Server treats both queries the same internally.
If you want to be sure, you can look at the execution plan.
If you run both queries together, SQL Server will also tell you which query took more resources than the other (in percent).
A slightly different approach:
select * from adjustments a where exists
(select null from stores s1, stores s2
where a.store_id = s1.id and s1.original_id = s2.original_id and s2.name ='abcd')
As say Microsoft here:
Many Transact-SQL statements that include subqueries can be
alternatively formulated as joins. Other questions can be posed only
with subqueries. In Transact-SQL, there is usually no performance
difference between a statement that includes a subquery and a
semantically equivalent version that does not. However, in some cases
where existence must be checked, a join yields better performance.
Otherwise, the nested query must be processed for each result of the
outer query to ensure elimination of duplicates. In such cases, a join
approach would yield better results.
Your case is exactly when Join and subquery gives the same performance.
Example when subquery can not be converted to "simple" JOIN:
select Country,TR_Country.Name as Country_Translated_Name,TR_Country.Language_Code
from Country
JOIN TR_Country ON Country.Country=Tr_Country.Country
where country =
(select top 1 country
from Northwind.dbo.Customers C
join
Northwind.dbo.Orders O
on C.CustomerId = O.CustomerID
group by country
order by count(*))
As you can see, every country can have different name translations so we can not just join and count records (in that case, countries with larger quantities of translations will have more record counts)
Of cource, you can can transform this example to:
JOIN with derived table
CTE
but it is an other tale-)