Join a table to itself using an array column? - sql

I have a table:
id integer
status_id integer
children integer[]
How do I write a query to join the table to itself on children and find all records where (all children items have status_id 2) & the item has status of 1?
In addition should children be indexed and with what index?
Edit: Based on krokodilko answer I think the query may be:
SELECT id
FROM (
SELECT p.id, p.status_id, every(c.status_id = 2) AS all_children_status_2
FROM positions p
JOIN positions c
ON c.id = ANY (p.children)
GROUP BY p.id, p.status_id
) sub
WHERE all_children_status_2 IS TRUE AND status_id = 1;
Edit 2:
Please note I have found in my reading that the array columns should use a GIN or GIST index. However unfortunately postgres does not use these indexes when using ANY. These mean that while the above query works it is very slow.

Use ANY operator:
Demo: http://www.sqlfiddle.com/#!17/2540d/1
CREATE TABLE parent(
id int,
child int[]
);
INSERT INTO parent VALUES(1, '{2,4,5}');
CREATE TABLE child(
id int
);
INSERT INTO child VALUES (1),(2),(3),(4),(5);
SELECT *
FROM parent p
JOIN child c
ON c.id = any( p.child );
| id | child | id |
|----|-------|----|
| 1 | 2,4,5 | 2 |
| 1 | 2,4,5 | 4 |
| 1 | 2,4,5 | 5 |
In addition should children be indexed and with what index?
Yes - if children table is huge (more than a few hundreds/thousands rows).
CREATE INDEX some_name ON child( id )

Related

Result of query as column value

I've got three tables:
Lessons:
CREATE TABLE lessons (
id SERIAL PRIMARY KEY,
title text NOT NULL,
description text NOT NULL,
vocab_count integer NOT NULL
);
+----+------------+------------------+-------------+
| id | title | description | vocab_count |
+----+------------+------------------+-------------+
| 1 | lesson_one | this is a lesson | 3 |
| 2 | lesson_two | another lesson | 2 |
+----+------------+------------------+-------------+
Lesson_vocabulary:
CREATE TABLE lesson_vocabulary (
lesson_id integer REFERENCES lessons(id),
vocabulary_id integer REFERENCES vocabulary(id)
);
+-----------+---------------+
| lesson_id | vocabulary_id |
+-----------+---------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 2 |
| 2 | 4 |
+-----------+---------------+
Vocabulary:
CREATE TABLE vocabulary (
id integer PRIMARY KEY,
hiragana text NOT NULL,
reading text NOT NULL,
meaning text[] NOT NULL
);
Each lesson contains multiple vocabulary, and each vocabulary can be included in multiple lessons.
How can I get the vocab_count column of the lessons table to be calculated and updated whenevr I add more rows to the lesson_vocabulary table. Is this possible, and how would I go about doing this?
Thanks
You can use SQL triggers to serve your purpose. This would be similar to mysql after insert trigger which updates another table's column.
The trigger would look somewhat like this. I am using Oracle SQL, but there would just be minor tweaks for any other implementation.
CREATE TRIGGER vocab_trigger
AFTER INSERT ON lesson_vocabulary
FOR EACH ROW
begin
for lesson_cur in (select LESSON_ID, COUNT(VOCABULARY_ID) voc_cnt from LESSON_VOCABULARY group by LESSON_ID) LOOP
update LESSONS
set VOCAB_COUNT = LESSON_CUR.VOC_CNT
where id = LESSON_CUR.LESSON_ID;
end loop;
END;
It's better to create a view that calculates that (and get rid of the column in the lessons table):
select l.*, lv.vocab_count
from lessons l
left join (
select lesson_id, count(*)
from lesson_vocabulary
group by lesson_id
) as lv(lesson_id, vocab_count) on l.id = lv.lesson_id
If you really want to update the lessons table each time the lesson_vocabulary changes, you can run an UPDATE statement like this in a trigger:
update lessons l
set vocab_count = t.cnt
from (
select lesson_id, count(*) as cnt
from lesson_vocabulary
group by lesson_id
) t
where t.lesson_id = l.id;
I would recommend using a query for this information:
select l.*,
(select count(*)
from lesson_vocabulary lv
where lv.lesson_id = l.lesson_id
) as vocabulary_cnt
from lessons l;
With an index on lesson_vocabulary(lesson_id), this should be quite fast.
I recommend this over an update, because the data remains correct.
I recommend this over a trigger, because it is simpler.
I recommend this over a subquery with aggregation because it should be faster, particularly if you are filtering on the lessons table.

Set results to be array in a new column for over a million rows -postgreSQL

I am trying to get all the photo urls into an array and set that array to be a new column in a different table. The tables have a one to many relationship. I have about 1 million rows on my list_reviews table and around 3 million photos.
Is there a way to do this in batches? When I tried to do it all in one shot I just got empty arrays.
https://www.postgresql.org/message-id/20051219121211.002f7e87.gry#ll.mit.edu and Postgresql select rows(a result) as array
Those work if I am only doing one at a time. I have been thinking about trying to use the STREAMING found here, https://github.com/vitaly-t/pg-promise/wiki/Learn-by-Example#into-database but not sure I totally understand what is going on here.
CREATE TABLE list_reviews (
id SERIAL PRIMARY KEY,
product_id INT,
photos TEXT[]);
CREATE TABLE review_photos (
id SERIAL,
review_id INT REFERENCES list_reviews(id) ON DELETE CASCADE,
url TEXT);
UPDATE list_reviews SET photos = array(
SELECT url
FROM review_photos
WHERE review_photos.id = list_reviews.id
AND list_reviews.id = 5);
list_reviews looks like:
+----+------------+--------+--+
| id | product_id | photos | |
+----+------------+--------+--+
| 5 | 1 | [] | |
+----+------------+--------+--+
review_photos looks like:
+----+-----------+------------+--+
| id | review_id | photos | |
+----+-----------+------------+--+
| 1 | 5 | something1 | |
| 2 | 5 | something2 | |
| 3 | 5 | something3 | |
+----+-----------+------------+--+
and would expect to see list_reviews:
+----+------------+--------------------------------------+--+
| id | product_id | photos | |
+----+------------+--------------------------------------+--+
| 5 | 1 | [something1, something2, something3] | |
+----+------------+--------------------------------------+--+
Your code basically looks okay. I prefer using array_agg() (because the operation is more explicit), but Postgres allows a result set for array.
One issue is the filtering. I think you intend:
UPDATE list_reviews lr
SET photos = array(SELECT rp.url
FROM review_photos rp
WHERE rp.id = lr.id
)
WHERE lr.id = 5;
Your query will update all rows in list_reviews with the urls from the photos for id = 5.
You can do this in batches by putting ranges on lr.id as you do the processing. For instance:
UPDATE list_reviews lr
SET photos = array(SELECT rp.url
FROM review_photos rp
WHERE rp.id = lr.id
)
WHERE lr.id > 0 and lr.id < 10000;
However, it might be simpler to replace the existing table:
create temporary table temp_list_reviews as
select id, product_id, -- all columns but photos
array(SELECT rp.url
FROM review_photos rp
WHERE rp.id = lr.id
) as photos
from list_reviews;
truncate table list_reviews;
insert into list_reviews (id, product_id, photos)
select id, product_id, photo
from temp_list_reviews;
Bulk inserts are usually faster than updates, due to logging considerations.

SQL Server - Get Parent and Ancestor PK

I am using SQL Server 2012, I have a table called [Area]. This table contains a PK and a ParentPK. The ParentPk references the PK from the same [Area] table recursively. If there is no parent, then null is filled in for ParentPk.
Pk ParentPk
----------------
1 null
2 null
3 1
4 3
...
I know that the ancestor-child relationship is exactly 3 levels deep. In the above example, The Pk:4 has its parentPk:3 and grandParentPk:1.
I want to be able have a SELECT query be in the form:
SELECT GrandParentPk, ParentPk, ChildPk
...
...
...
where ChildPk = <childPk>
Is there a non stored procedure, non recursive solution to achieve this?
If you are always querying using the last child pk, and the hierarchy is always three levels, you can just use two inner joins.
select gp.pk as GrandParentPk, p.pk as ParentPk, c.pk as ChildPk
from Area c
inner join Area p
on c.parentPk = p.pk
inner join Area gp
on p.parentPk = gp.pk
where c.pk = 4
rextester demo: http://rextester.com/MSXTVR55260
returns:
+---------------+----------+---------+
| GrandParentPk | ParentPk | ChildPk |
+---------------+----------+---------+
| 1 | 3 | 4 |
+---------------+----------+---------+

SQL index based search

I have a table called Index which has the columns id and value, where id is an auto-increment bigint and value is a varchar with an english word.
I have a table called Search which has relationships to the table Index. For each search you can define which indexes it should search in a table called Article.
The table Article also has relationships to the table Index.
The tables which define the relationships are:
Searches_Indexes with columns id_search and id_index.
Articles_Indexes with columns id_article and id_index.
I would like to find all Articles that contain the same indexes of Search.
For example: I have a Search with indexes laptop and dell, I would like to retrieve all Articles which contain both indexes, not just one.
So far I have this:
SELECT ai.id_article
FROM articles_indexes AS ai
INNER JOIN searches_indexes AS si
ON si.id_index = ai.id_index
WHERE si.id_search = 1
How do I make my SQL only return the Articles with all the Indexes of a Search?
Edit:
Sample Data:
Article:
id | name | description | ...
1 | 'Dell Laptop' | 'New Dell Laptop...' | ...
2 | 'HP Laptop' | 'Unused HP Laptop...' | ...
...
Search:
id | name | id_user | ...
1 | 'Dell Laptop Search' | 5 | ...
Index:
id | value
1 | 'dell'
2 | 'laptop'
3 | 'hp'
4 | 'new'
5 | 'unused'
...
Articles_Indexes:
Article with id 1 (the dell laptop) has the Indexes 'dell', 'laptop', 'new'.
Article with id 2 (the hp laptop) has the Indexes 'laptop', 'hp', 'unused'.
id_article | id_index
1 | 1
1 | 2
1 | 4
...
2 | 2
2 | 3
2 | 5
...
Searches_Indexes:
Search with id 1 only contains 2 Indexes, 'dell' and 'laptop':
id_search | id_index
1 | 1
1 | 2
Required output:
id_article
1
If I understand correctly, you want aggregation and a HAVING clause. Assuming there are no duplicate entries in the indexes tables:
SELECT ai.id_article
FROM articles_indexes ai INNER JOIN
searches_indexes si
ON si.id_index = ai.id_index
WHERE si.id_search = 1
GROUP BY ai.id_article
HAVING COUNT(*) = (SELECT COUNT(*) FROM searches_indexes si2 WHERE si2.id_search = 1);
This counts the number of matches and makes sure it matches the number you are looking for.
I should add this. If you wanted to look for all searches at the same time, I'd be inclined to write this as:
SELECT si.id_search, ai.id_article
FROM articles_indexes ai INNER JOIN
(SELECT si.*, COUNT(*) OVER (PARTITION BY si.id_index) as cnt
FROM searches_indexes si
) si
ON si.id_index = ai.id_index
GROUP BY si.id_search, ai.id_article, si.cnt
HAVING COUNT(*) = si.cnt;
You can compare arrays. Here is some example:
create table article_index(id_article int, id_index int);
create table search_index(id_search int, id_index int);
insert into article_index
select generate_series(1,2), generate_series(1,10);
insert into search_index
select generate_series(1,2), generate_series(1,4);
select
id_article
from article_index
group by id_article
having array_agg(id_index) #> (select array_agg(id_index) from search_index where id_search = 2);
Learn more about arrays in postgres.

Creating a query to find matching objects in a "join" table

I am trying to find an efficient query to find all matching objects in a "join" table.
Given an object Adopter that has many Pets, and Pets that have many Adopters through a AdopterPets join table. How could I find all of the Adopters that have the same Pets?
The schema is fairly normalized and looks like this.
TABLE Adopter
INTEGER id
TABLE AdopterPets
INTEGER adopter_id
INTEGER pet_id
TABLE Pets
INTEGER id
Right now the solution I am using loops through all Adopters and asks for their pets anytime it we have a match store it away and can use it later, but I am sure there has to be a better way using SQL.
One SQL solution I looked at was GROUP BY but it did not seem to be the right trick for this problem.
EDIT
To explain a little more of what I am looking for I will try to give an example.
+---------+ +------------------+ +------+
| Adptors | | AdptorsPets | | Pets |
|---------| +----------+-------+ |------|
| 1 | |adptor_id | pet_id| | 1 |
| 2 | +------------------+ | 2 |
| 3 | |1 | 1 | | 3 |
+---------+ |2 | 1 | +------+
|1 | 2 |
|3 | 1 |
|3 | 2 |
|2 | 3 |
+------------------+
When you asked the Adopter with the id of 1 for any other Adopters that have the same Pets you would be retured id 3.
If you asked the same question for the Adopter with the id of 3 you would get id 1.
If you asked again the same question of the Adopter with id 2` you would be returned nothing.
I hope this helps clear things up!
Thank you all for the help, I used a combination of a few things:
SELECT adopter_id
FROM (
SELECT adopter_id, array_agg(pet_id ORDER BY pet_id)
AS pets
FROM adopters_pets
GROUP BY adopter_id
) AS grouped_pets
WHERE pets = array[1,2,3] #array must be ordered
AND adopter_id <> current_adopter_id;
In the subquery I get pet_ids grouped by their adopter. The ordering of the pet_ids is key so that the results in the main query will not be order dependent.
In the main query I compare the results of the subquery to the pet ids of the adopter I am looking to match. For the purpose of this answer the pet_ids of the particular adopter are represented by [1,2,3]. I then make sure that that the adopter I am comparing to is not included in the results.
Let me know if anyone sees any optimizations or if there is a way to compare arrays where order does not matter.
I'm not sure if this is exactly what you're looking for but this might give you some ideas.
First I created some sample data:
create table adopter (id serial not null primary key, name varchar );
insert into adopter (name) values ('Bob'), ('Sally'), ('John');
create table pets (id serial not null primary key, kind varchar);
insert into pets (kind) values ('Dog'), ('Cat'), ('Rabbit'), ('Snake');
create table adopterpets (adopter_id integer, pet_id integer);
insert into adopterpets values (1, 1), (1, 2), (2, 1), (2,3), (2,4), (3, 1), (3,3);
Next I ran this query:
SELECT p.kind, array_agg(a.name) AS adopters
FROM pets p
JOIN adopterpets ap ON ap.pet_id = p.id
JOIN adopter a ON a.id = ap.adopter_id
GROUP BY p.kind
HAVING count(*) > 1
ORDER BY kind;
kind | adopters
--------+------------------
Dog | {Bob,Sally,John}
Rabbit | {Sally,John}
(2 rows)
In this example, for each pet I'm creating an array of all owners. The HAVING count(*) > 1 clause ensures we only show pets with shared owners (more than 1). If we leave this out we'll include pets that don't share owners.
UPDATE
#scommette: Glad you've got it working! I've refactored your working example a little bit below to:
use #> operator. This checks if one array contains the other avoids need to explicitly set order
moved the grouped_pets subquery to a CTE. This isn't only solution but neatly allows you to both filter out the current_adopter_id and get the pets for that id
You might find it helpful to wrap this in a function.
WITH grouped_pets AS (
SELECT adopter_id, array_agg(pet_id ORDER BY pet_id) AS pets
FROM adopters_pets
GROUP BY adopter_id
)
SELECT * FROM grouped_pets
WHERE adopter_id <> 3
AND pets #> (
SELECT pets FROM grouped_pets WHERE adopter_id = 3
);
If you're using Oracle then wm_concat could be useful here
select pet_id, wm_concat(adopter_id) adopters
from AdopterPets
group by pet_id ;
--
-- Relational division 1.0
-- Show all people who own *exactly* the same (non-empty) set
-- of animals as I do.
--
-- Test data
CREATE TABLE adopter (id INTEGER NOT NULL primary key, fname varchar );
INSERT INTO adopter (id,fname) VALUES (1,'Bob'), (2,'Alice'), (3,'Chris');
CREATE TABLE pets (id INTEGER NOT NULL primary key, kind varchar);
INSERT INTO pets (id,kind) VALUES (1,'Dog'), (2,'Cat'), (3,'Pig');
CREATE TABLE adopterpets (adopter_id integer REFERENCES adopter(id)
, pet_id integer REFERENCES pets(id)
);
INSERT INTO adopterpets (adopter_id,pet_id) VALUES (1, 1), (1, 2), (2, 1), (2,3), (3,1), (3,2);
-- Show it to the world
SELECT ap.adopter_id, ap.pet_id
, a.fname, p.kind
FROM adopterpets ap
JOIN adopter a ON a.id = ap.adopter_id
JOIN pets p ON p.id = ap.pet_id
ORDER BY ap.adopter_id,ap.pet_id;
SELECT DISTINCT other.fname AS same_as_me
FROM adopter other
-- moi has *at least* one same kind of animal as toi
WHERE EXISTS (
SELECT * FROM adopterpets moi
JOIN adopterpets toi ON moi.pet_id = toi.pet_id
WHERE toi.adopter_id = other.id
AND moi.adopter_id <> toi.adopter_id
-- C'est moi!
AND moi.adopter_id = 1 -- 'Bob'
-- But moi should not own an animal that toi doesn't have
AND NOT EXISTS (
SELECT * FROM adopterpets lnx
WHERE lnx.adopter_id = moi.adopter_id
AND NOT EXISTS (
SELECT *
FROM adopterpets lnx2
WHERE lnx2.adopter_id = toi.adopter_id
AND lnx2.pet_id = lnx.pet_id
)
)
-- ... And toi should not own an animal that moi doesn't have
AND NOT EXISTS (
SELECT * FROM adopterpets rnx
WHERE rnx.adopter_id = toi.adopter_id
AND NOT EXISTS (
SELECT *
FROM adopterpets rnx2
WHERE rnx2.adopter_id = moi.adopter_id
AND rnx2.pet_id = rnx.pet_id
)
)
)
;
Result:
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "adopter_pkey" for table "adopter"
CREATE TABLE
INSERT 0 3
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "pets_pkey" for table "pets"
CREATE TABLE
INSERT 0 3
CREATE TABLE
INSERT 0 6
adopter_id | pet_id | fname | kind
------------+--------+-------+------
1 | 1 | Bob | Dog
1 | 2 | Bob | Cat
2 | 1 | Alice | Dog
2 | 3 | Alice | Pig
3 | 1 | Chris | Dog
3 | 2 | Chris | Cat
(6 rows)
same_as_me
------------
Chris
(1 row)