Populate Temp Table Postgres - sql

I have the following three tables in the postgres db of my django app:
publication {
id
title
}
tag {
id
title
}
publication_tags{
id
publication_id
tag_id
}
Where tag and publication have a many to many relationship.
I'd like to make a temp table with three columns: 1)publication title, 2)publication id, and 3)tags, where tags is a list (in the form of a string if possible) of all the tags on a given publication.
Thus far I have made the temp table and populated it with the publication id and publication title, but I don't know how to get the tags into it. This is what I have so far:
CREATE TEMP TABLE pubtags (pub_id INTEGER, pub_title VARCHAR(50), pub_tags VARCHAR(50))
INSERT INTO pubtags(pub_id, pub_title) SELECT id, title FROM apricot_app_publication
Can anyone advise me on how I would go about the last step?

Sounds like a job for string_agg:
string_agg(expression, delimiter)
input values concatenated into a string, separated by delimiter
So something like this should do the trick:
insert into pubtags (pub_id, pub_title, pub_tags)
select p.id, p.title, string_agg(t.title, ' ,')
from publication p
join publication_tags pt on (p.id = pt.publication_id)
join tag on (pt.tag_id = t.id)
group by p.id, p.title
You may want to adjust the delimiter, I guessed that a comma would make sense.
I'd recommend using TEXT instead of VARCHAR for your pub_tags so that you don't have to worry about the string aggregation overflowing the pub_tags length. Actually, I'd recommend using TEXT instead of VARCHAR period: PostgreSQL will treat them both the same except for wasting time on length checks with VARCHAR so VARCHAR is pointless unless you have a specific need for a limited length.
Also, if you don't specifically need pub_tags to be a string, you could use an array instead:
CREATE TEMP TABLE pubtags (
pub_id INTEGER,
pub_title TEXT,
pub_tags TEXT[]
)
and array_agg instead of string_agg:
insert into pubtags (pub_id, pub_title, pub_tags)
select p.id, p.title, array_agg(t.title)
-- as above...
Using an array will make it a lot easier to unpack the tags if you need to.

Related

SQL: combine two tables for a query

I want to query two tables at a time to find the key for an artist given their name. The issue is that my data is coming from disparate sources and there is no definitive standard for the presentation of their names (e.g. Forename Surname vs. Surname, Forename) and so to this end I have a table containing definitive names used throughout the rest of my system along with a separate table of aliases to match the varying styles up to each artist.
This is PostgreSQL but apart from the text type it's pretty standard. Substitute character varying if you prefer:
create table Artists (
id serial primary key,
name text,
-- other stuff not relevant
);
create table Aliases (
artist integer references Artists(id) not null,
name text not null
);
Now I'd like to be able to query both sets of names in a single query to obtain the appropriate id. Any way to do this? e.g.
select id from ??? where name = 'Bloggs, Joe';
I'm not interested in revising my schema's idea of what a "name" is to something more structured, e.g. separate forename and surname, since it's inappropriate for the application. Most of my sources don't structure the data, sometimes one or the other name isn't known, it may be a pseudonym, or sometimes the "artist" may be an entity such as a studio.
I think you want:
select a.id
from artists a
where a.name = 'Bloggs, Joe' or
exists (select 1
from aliases aa
where aa.artist = a.id and
aa.name = 'Bloggs, Joe'
);
Actually, if you just want the id (and not other columns), then you can use:
select a.id
from artists a
where a.name = 'Bloggs, Joe'
union all -- union if there could be duplicates
select aa.artist
from aliases aa
where aa.name = 'Bloggs, Joe';

SQLite, Many to many relations, How to aggregate?

I have the classic arrangement for a many to many relation in a small flashcard like application built using SQLite. Every card can have multiple tags, and every tag can have multiple cards. This two entities having each a table with a third table to link records.
This is the table for Cards:
CREATE TABLE Cards (CardId INTEGER PRIMARY KEY AUTOINCREMENT,
Text TEXT NOT NULL,
Answer INTEGER NOT NULL,
Success INTEGER NOT NULL,
Fail INTEGER NOT NULL);
This is the table for Tags:
CREATE TABLE Tags (TagId INTEGER PRIMARY KEY AUTOINCREMENT,
Name TEXT UNIQUE NOT NULL);
This is the cross reference table:
CREATE TABLE CardsRelatedToTags (CardId INTEGER,
TagId INTEGER,
PRIMARY KEY (CardId, TagId));
I need to get a table of cards with their associated tags in a column separated by commas.
I can already get what I need for a single row knowing its Id with the following query:
SELECT Cards.CardId, Cards.Text,
(SELECT group_concat(Tags.Name, ', ') FROM Tags
JOIN CardsRelatedToTags ON CardsRelatedToTags.TagId = Tags.TagId
WHERE CardsRelatedToTags.CardId = 1) AS TagsList
FROM Cards
WHERE Cards.CardId = 1
This will result in something like this:
CardId | Text | TagsList
1 | Some specially formatted text | Tag1, Tag2, TagN...
How to get this type of result (TagsList from group_concat) for every row in Cards using a SQL query? It is advisable to do so from the performance point of view? Or I need to do this sort of "presentation" work in application code using a simpler request to the DB?
Answering your code question:
SELECT
c.CardId,
c.Text,
GROUP_CONCAT(t.Name,', ') AS TagsList
FROM
Cards c
JOIN CardsRelatedToTags crt ON
c.CardId = crt.CardId
JOIN Tags t ON
crt.TagId = t.TagId
WHERE
c.CardId = 1
GROUP BY c.CardId, c.Text
Now, to the matter of performance. Databases are a powerful tool and do not end on simple SELECT statements. You can definitely do what you need inside a DB (even SQLite). It is a bad practice to use a SELECT statement as a feed for one column inside another SELECT. It would require scanning a table to get result for each row in your input.

Search for data that matches every single tag (using the LIKE operator)

For the past two days I've been doing my best to put together a query that will pull data based on inputted tags. The purpose is for an autocomplete field, where the words the user inputs are split into tags. I really need to use the LIKE operator because the whole purpose of autocomplete is that the user does not need to write out full words
CREATE TABLE `movies` (
`id` int(10) unsigned NOT NULL,
`name` varchar(250) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE `tags` (
`tag` varchar(50) NOT NULL DEFAULT '',
`mid` int(10) unsigned NOT NULL,
KEY `mid` (`mid`),
KEY `alphabetizer` (`tag`),
CONSTRAINT `tags_ibfk_1` FOREIGN KEY (`mid`) REFERENCES `movies` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Current query:
SELECT *
FROM movies ,
JOIN tags t ON m.id = t.mid
WHERE t.tag LIKE 'Dawn%' OR t.tag LIKE 'of%'
GROUP BY m.id
HAVING COUNT(DISTINCT m.tag) = 2
EDIT: The issue is that, as more tags are added, the vaguer the results get. This is the opposite effect.
SELECT m.id, group_concat(t.tag separator ', ') as tags
FROM movies
JOIN tags t
ON m.id = t.mid
GROUP BY m.id
HAVING group_concat(t.tag) like '%DAWN%'
and group_concat(t.tag) like '%OF%'
NOTE: Are the tags uppercase or lowercase, or mixed? In the above answer I assume they are all uppercase. You can use the UPPER or LOWER functions if the tags are not consistent, but they should be consistently upper or lowercase.
In the above query I used group_concat to show all tags on one row for each ID. That is a MYSQL function that will only work in MYSQL (you didn't mention what database you're using). If postgresql you would use string_agg, if Oracle you would use LISTAGG.
What if you do something like this
select mid,
group_concat(tag separator ' ') as fulltag
from tags
group by mid
the above query will get the entire tag list. say for movie_id 1 if tags are dawn,of,planet then query result would be
1 dawn of planet
Then do a join like below. Also, if I consider that user enters tags as dawn and of together then before passing it to query, you can join them to make dawn of as a single string and then use a LIKE operator against it. That way you don't need to plumb multiple LIKE together.
So essentially, if user enters dawn as tag say like 'dawn%'.
if user enters dawn , of and planet as tag then join them to make dawn of planet and say like 'dawn of planet%'.
This joining the tags you can perform in your app code and then pass that as parameter to your query.
select m.name,
tab.fulltag
from movies m
join
(
select mid,
group_concat(separator ' ') as fulltag
from tags
group by mid
) tab on m.id = tab.mid
where tab.tag LIKE 'Dawn Of%'

Selecting distinct rows based on values from left table

Using Postgres. Here's my scenario:
I have three different tables. One is a title table. The second is a genre table. The third table is used to join the two. When I designed the database, I expected that each title would have one top level genre. After filling it with data, I discovered that there were titles that had two, sometimes, three top level genres.
I wrote a query that retrieves titles and their top level genres. This obviously requires that I join the two tables. For those that only have one top level genre, there is one record. For those that have more, there are multiple records.
I realize I'll probably have to write a custom function of some kind that will handle this for me, but I thought I'd ask if it's possible to do this without doing so just to make sure I'm not missing anything.
Is it possible to write a query that will allow me to select all of the distinct titles regardless of the number of genres that it has, but also include the genre? Or even better, a query that would give me a comma delimited string of genres when there are multiples?
Thanks in advance!
Sounds like a job for array_agg to me. With tables like this:
create table t (id int not null, title varchar not null);
create table g (id int not null, name varchar not null);
create table tg (t int not null, g int not null);
You could do something like this:
SELECT t.title, array_agg(g.name)
FROM t, tg, g
WHERE t.id = tg.t
AND tg.g = g.id
GROUP BY t.title, t.id
to get:
title | array_agg
-------+-----------------------
one | {g-one,g-two,g-three}
three | {g-three}
two | {g-two}
Then just unpack the arrays as needed. If for some reason you really want a comma delimited string instead of an array, then string_agg is your friend:
SELECT t.title, string_agg(g.name, ',')
FROM t, tg, g
WHERE t.id = tg.t
AND tg.g = g.id
GROUP BY t.title, t.id
and you'll get something like this:
title | string_agg
-------+---------------------
one | g-one,g-two,g-three
three | g-three
two | g-two
I'd go with the array approach so that you wouldn't have to worry about reserving a character for the delimiter or having to escape (and then unescape) the delimiter while aggregating.
Have a look at this thread which might answer your question.

PostgreSQL query involving integer[]

I have 2 tables:
CREATE TABLE article (
id serial NOT NULL,
title text,
tags integer[] -- array of tag id's from TAG table
)
CREATE TABLE tag (
id serial NOT NULL,
description character varying(250) NOT NULL
)
... and need to select tags from TAG table held in ARTICLE's 'tags integer[]' based on article's title.
So tried something like
SELECT *
FROM tag
WHERE tag.id IN ( (select article.tags::int4
from article
where article.title = 'some title' ) );
... which gives me
ERROR: cannot cast type integer[] to integer
LINE 1: ...FROM tag WHERE tag.id IN ( (select article.tags::int4 from ...
I am Stuck with PostgreSql 8.3 in both dev and production environment.
Use the array overlaps operator &&:
SELECT *
FROM tag
WHERE ARRAY[id] && ANY (SELECT tags FROM article WHERE title = '...');
Using contrib/intarray you can even index this sort of thing quite well.
Take a look at section "8.14.5. Searching in Arrays", but consider the tip at the end of that section:
Tip: Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.
You did not mention your Postgres version, so I assume you are using an up-to-date version (8.4, 9.0)
This should work then:
SELECT *
FROM tag
WHERE tag.id IN ( select unnest(tags)
from article
where title = 'some title' );
But you should really consider changing your table design.
Edit
For 8.3 the unnest() function can easily be added, see this wiki page:
http://wiki.postgresql.org/wiki/Array_Unnest