How to convert [NULL] to NULL in Postgres SQL statement? - sql

In Postgres, I set out to write a SQL statement that would return various fields from one table, along with a column containing an array of tag strings that come from another table. I've made quite good progress with this code:
SELECT p.photo_id, p.name, p.path, array_agg(t.tag) as tags FROM photos p
JOIN users u USING (user_id)
LEFT JOIN photo_tags pt USING (photo_id)
LEFT JOIN tags t USING (tag_id)
WHERE u.user_id = 'some_uuid'
GROUP BY p.photo_id, p.name, p.path
ORDER BY date(p.date_created) DESC, p.date_created ASC
Everything is working exactly like I intended except for one thing: If a given photo has no tags attached to it then this is being returned: [NULL]
I would prefer to return just NULL rather than null in an array. I've tried several things, including using coalesce and ifnull but couldn't fix things precisely the way I want.
Not the end of the world if an array with NULL is returned by the endpoint but if you know a way to return just NULL instead, I would appreciate learning how to do this.

You can filter out nulls during the join process.
If none is returned, you should get a NULL instead of [NULL]
SELECT array_agg(t.tag) filter (where t.tag is not null) as tags
FROM ...

I would go with a subquery in your case:
SELECT p.photo_id, p.name, p.path, agg_tags as tags
FROM photos p
JOIN users u USING (user_id)
LEFT JOIN photo_tags pt USING (photo_id)
LEFT JOIN (
SELECT tag_id, array_agg(tag) AS agg_tags
FROM tags
GROUP BY tag_id
) t USING (tag_id)
WHERE u.user_id = 'some_uuid'
ORDER BY date(p.date_created) DESC, p.date_created ASC
You did not post many information about your schema, table size and so on but a LATERAL join could be an option to add on the above syntax.

Related

Optimize join query with need to "filter" for distinct

I'm wondering how much more I can optimize a relatively simple query. For abstraction, I only have 2 tables, albums, with album_id and album_title and the other is albums_genre with album_id and album_genre as columns.
My query is the following:
SELECT DISTINCT album_title
from albums
INNER JOIN albums_genre
ON albums.album_id = albums_genre.album_id AND (genre = 'Metal' OR genre = 'Jazz')
The problem for me is that in the albums table 1 id is potentially matched to multiple album titles, I'm trying to find a way to cut down the time needed for distinct since it takes more than half of the overall time. I'm using microsoft DB.
Thanks to everyone in advance!
a more optimal way is to use exists instead of join for your case scenario, you also can use in instead of Or for checking multiple values in a column :
SELECT album_title
from albums a
where exists (
select 1 from albums_genre g
where a.album_id = g.album_id
and genre in ('Metal', 'Jazz')
)
The first step is to write the query using exists:
SELECT a.album_title
FROM albums a
WHERE EXISTS (SELECT 1
FROM albums_genre ag
WHERE ag.album_id = a.album_id AND
ag.genre IN ('Metal', 'Jazz')
);
Then for performance, you want an index on albums_genre(album_id, genre). And index on just (album_id) is probably also good enough.
Note that the performance gain comes from removing the aggregation/distinct removal in the outer query.
Initially the query slowing down was on the 'OR', convert the 'Or' into subquery see below. Instead of distinct use group by
SELECT album_title
from albums
INNER JOIN (select album_id from albums_genre where genre in ('Metal'
,'Jazz'))albums_genre
ON albums.album_id = albums_genre.album_id
group by album_title

SQL result duplicates the photos

I want to show all the photos that have the specific tag, but it only duplicates the photos. If I choose another tag, it doesn't show duplicated photos.
For the tag "Natur" it should only be 2 photos and for the tag "Berg" it should only be 1 photo.
SQL
SELECT *
FROM photos AS p
JOIN tags_photos AS tp
JOIN tags_names AS tn
ON tp.id_tag = tn.id
WHERE tn.data_name_seo = :name_seo
ORDER BY p.datetime_taken DESC
Database: tags_photos
id
id_photo
id_tag
Database: tags_name
id
data_name
data_name_seo
Database: photos
id
data_file_name
datetime_taken
Have I missed something or what's the problem?
You are missing join conditions for the first two tables. This is probably the cause of your problem:
SELECT *
FROM photos AS p JOIN
tags_photos AS tp
ON tp.id_photo = p.id JOIN
tags_names AS tn
ON tp.id_tag = tn.id
WHERE tn.data_name_seo = :name_seo
ORDER BY p.datetime_taken DESC
In most databases, the missing on clause would generate an error. In MySQL, the JOIN is treated as a CROSS JOIN, which likely would result in duplicates.

PostgreSQL - GROUP BY clause

I want to search by tags, and then list all articles with that tag, and also how many of given tags they match. So for example I might have:
Page1 - 2 (has css and php tag)
Page2 - 1 (has only css tag)
Query:
SELECT COUNT(t.tag)
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
JOIN w_article a ON a.id = a2t.article
WHERE t.tag = 'css' OR t.tag = 'php'
GROUP BY t.tag
LIMIT 9
When I only put COUNT(t.tag) the query works, and I get okay results. But if I append e.g. ID of my article I get following error:
ERROR: column "a.title" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT COUNT(t.tag), a.title FROM a_tags t
How to add said columns to this query?
Postgres 9.1 or later, quoting the release notes of 9.1 ...
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)
The SQL standard allows this behavior, and because of the primary key,
the result is unambiguous.
Related:
Return a grouped list with occurrences using Rails and PostgreSQL
The queries in the question and in #Michael's answer have the logic backwards. We want to count how many tags match per article, not how many articles have a certain tag. So we need to GROUP BY w_article.id, not by a_tags.id.
list all articles with that tag, and also how many of given tags they match
To fix this:
SELECT count(t.tag) AS ct, a.* -- any column from table a allowed ...
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
JOIN w_article a ON a.id = a2t.article
WHERE t.tag IN ('css', 'php')
GROUP BY a.id -- ... since PK is in GROUP BY
LIMIT 9;
Assuming id is the primary key of w_article.
However, this form will be faster while doing the same:
SELECT a.*, ct
FROM (
SELECT a2t.article AS id, count(*) AS ct
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
GROUP BY 1
LIMIT 9 -- LIMIT early - cheaper
) sub
JOIN w_article a USING (id); -- attached alias to article in the sub
Closely related answer from just yesterday:
Why does the following join increase the query time significantly?
When you use a "GROUP BY" clause, you need to enclose all columns that are not grouped in an aggregate function. Try adding title to the GROUP BY list, or selecting "min(a.title)" instead.
SELECT COUNT(t.tag), a.title FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
JOIN w_article a ON a.id = a2t.article
WHERE t.tag = 'css' OR t.tag = 'php' GROUP BY t.tag, a.title LIMIT 9

Using UNNEST with a JOIN

I want to be able to use unnest() function in PostgreSQL in a complicated SQL query that has many JOINs. Here's the example query:
SELECT 9 as keyword_id, COUNT(DISTINCT mentions.id) as total, tags.parent_id as tag_id
FROM mentions
INNER JOIN taggings ON taggings.mention_id = mentions.id
INNER JOIN tags ON tags.id = taggings.tag_id
WHERE mentions.taglist && ARRAY[9] AND mentions.search_id = 3
GROUP BY tags.parent_id
I want to eliminate the taggings table here, because my mentions table has an integer array field named taglist that consists of all linked tag ids of mentions.
I tried following:
SELECT 9 as keyword_id, COUNT(DISTINCT mentions.id) as total, tags.parent_id as tag_id
FROM mentions
INNER JOIN tags ON tags.id IN (SELECT unnest(taglist))
WHERE mentions.taglist && ARRAY[9] AND mentions.search_id = 3
GROUP BY tags.parent_id
This works but brings different results than the first query.
So what I want to do is to use the result of the SELECT unnest(taglist) in a JOIN query to compensate for the taggings table.
How can I do that?
UPDATE: taglist is the same set as the respective list of tag ids of mention.
Technically, your query might work like this (not entirely sure about the objective of this query):
SELECT 9 AS keyword_id, count(DISTINCT m.id) AS total, t.parent_id AS tag_id
FROM (
SELECT unnest(m.taglist) AS tag_id
FROM mentions m
WHERE m.search_id = 3
AND 9 = ANY (m.taglist)
) m
JOIN tags t USING (tag_id) -- assumes tag.tag_id!
GROUP BY t.parent_id;
However, it seems to me you are going in the wrong direction here. Normally one would remove the redundant array taglist and keep the normalized database schema. Then your original query should serve well, only shortened the syntax with aliases:
SELECT 9 AS keyword_id, count(DISTINCT m.id) AS total, t.parent_id AS tag_id
FROM mentions m
JOIN taggings mt ON mt.mention_id = m.id
JOIN tags t ON t.id = mt.tag_id
WHERE 9 = ANY (m.taglist)
AND m.search_id = 3
GROUP BY t.parent_id;
Unravel the mystery
<rant>
The root cause for your "different results" is the unfortunate naming convention that some intellectually challenged ORMs impose on people.
I am speaking of id as column name. Never use this anti-pattern in a database with more than one table. Right, that means basically any database. As soon as you join a bunch of tables (that's what you do in a database) you end up with a bunch of columns named id. Utterly pointless.
The ID column of a table named tag should be tag_id (unless there is another descriptive name). Never id.
</rant>
Your query inadvertently counts tags instead of mentions:
SELECT 25 AS keyword_id, count(m.id) AS total, t.parent_id AS tag_id
FROM (
SELECT unnest(m.taglist) AS id
FROM mentions m
WHERE m.search_id = 4
AND 25 = ANY (m.taglist)
) m
JOIN tags t USING (id)
GROUP BY t.parent_id;
It should work this way:
SELECT 25 AS keyword_id, count(DISTINCT m.id) AS total, t.parent_id
FROM (
SELECT m.id, unnest(m.taglist) AS tag_id
FROM mentions m
WHERE m.search_id = 4
AND 25 = ANY (m.taglist)
) m
JOIN tags t ON t.id = m.tag_id
GROUP BY t.parent_id;
I also added back the DISTINCT to your count() that got lost along the way in your query.
Something like this should work:
...
tags t INNER JOIN
(SELECT UNNEST(taglist) as idd) a ON t.id = a.idd
...

how can i eliminate duplicates in gridview?

i am retrieving data from three tables for my requirement so i wrote the following query
i was getting correct result but the problem is records are repeated whats the problem in
that query. i am binding result of query to grid view control. please help me
SELECT DISTINCT (tc.coursename), ur.username, uc. DATE, 'Paid' AS Status
FROM tblcourse tc, tblusereg ur, dbo.UserCourse uc
WHERE tc.courseid IN (SELECT ur1.courseid
FROM dbo.UserCourse ur1
WHERE ur1.userid = #userid)
AND ur.userid = #userid
AND uc. DATE IS NOT NULL
AND ur.course - id = uc.course - id
There is no JOIN between tblcourse tc,tblusereg ur. So you get a cross join despite the IN (which is actually a JOIN)
DISTINCT works on the whole row too: not one column.
Note: you mention dbo.UserCourse twice but use different column names courseid and [course-id]
Rewritten with JOINs.
select distinct
tc.coursename, ur.username, uc.[date], 'Paid' as [Status]
from
dbo.tblcourse tc
JOIN
dbo.tblusereg ur ON tc.courseid = ur.[course-id]
JOIN
dbo.UserCourse uc ON ur.[course-id] = uc.[course-id]
where
ur.userid=#userid
and
uc.[date] is not null
This may fix your problem...
Change that first part of your query
select distinct (tc.coursename),
TO
select distinct tc.coursename,
to make all the columns distinct not just tc.coursename