PostgreSQL can't aggregate data from many tables - sql

For simplicity, I will write the minimum number of fields in the tables.
Suppose I have this tables: items, item_photos, items_characteristics.
create table items (
id bigserial primary key,
title jsonb not null,
);
create table item_photos (
id bigserial primary key,
path varchar(1000) not null,
item_id bigint references items (id) not null,
sort_order smallint not null,
unique (path, item_id)
);
create table items_characteristics (
item_id bigint references items (id),
characteristic_id bigint references characteristics (id),
characteristic_option_id bigint references characteristic_options (id),
numeric_value numeric(19, 2),
primary key (item_id, characteristic_id),
unique (item_id, characteristic_id, characteristic_option_id));
And I want to aggregate all the photos and characteristics of one item.
For a start, I got this.
select i.id as id,
i.title as title,
array_agg( ip.path) as photos,
array_agg(
array [ico.characteristic_id, ico.characteristic_option_id, ico.numeric_value]) as characteristics_array
FROM items i
LEFT JOIN item_photos ip on i.id = ip.item_id
LEFT JOIN items_characteristics ico on ico.item_id = i.id
GROUP BY i.id
The first problem here arises in the fact that if there are 4 entries in item_characteristics that relate to one item, and, for example, item_photos did not have entries, I get an array of four null elements in the photos field {null, null, null, null}.
So I had to use array_remove:
array_remove(array_agg(ip.path), null) as photos
Further, if I have 1 photo and 4 characteristics, I get a duplicate of 4 photo entries, for example: {img/test-img-1.png,img/test-img-1.png,img/test-img-1.png,img/test-img-1.png}
So I had to use distinct:
array_remove(array_agg(distinct ip.path), null) as photos,
array_agg(distinct
array [ico.characteristic_id, ico.characteristic_option_id, ico.numeric_value]) as characteristics_array
The decision is rather awkward as for me.
The situation is complicated by the fact that I had to add 2 more fields to item_characteristics:
string_value jsonb, --string value
json_value jsonb --custom value
And so I need to aggregate already 5 values ​​from item_characteristics, where 2 are already jsonb and distinct can have a very negative impact on performance.
Is there any more elegant solution?

Aggregate before joining:
SELECT i.id as id, i.title as title, ip.paths, null as photos,
ico.characteristics_array
FROM items i LEFT JOIN
(SELECT ip.item_id, array_agg( ip.path) as paths
FROM item_photos ip
GROUP BY ip.item_ID
) ip
ON ip.id = i.item_id LEFT JOIN
(SELECT ico.item_id,
array_agg(array [ico.characteristic_id, ico.characteristic_option_id, ico.numeric_value]
) as characteristics_array
FROM items_characteristics ico
GROUP BY ico.item_id
) ico
ON ico.item_id = i.id

Related

Posgresql join two tables where the foriegn key is an array of ids

I am new to SQL and I have three table
Templates Table
CREATE TABLE templates (
template_id serial PRIMARY KEY,
template_name VARCHAR ( 15 ) UNIQUE NOT NULL,
FOREIGN KEY (developer_id) REFERENCES users(user_id),
FOREIGN KEY (category_id) REFERENCES categories(category_id),
tag_ids int[],
FOREIGN KEY (EACH ELEMENT OF tag_ids) REFERENCES tags(tag_id)
);
Categories Table
CREATE TABLE categories (
category_id serial PRIMARY KEY,
category_name VARCHAR ( 15 ) UNIQUE NOT NULL
);
Tags Table
CREATE TABLE tags (
tag_id serial PRIMARY KEY,
tag_name VARCHAR ( 100 ) NOT NULL,
);
I want to Select all templates where each template has a category object and a tags object.
Each template has one category but may have multiple tags.
I want to have the tags as an array attribute in the template object
I have tried this query, it does what i want but it creates multiple objects for the same template. So it simply creates n objects where n is the number of tags.
let query = `SELECT t.*, to_json(c) "category", ${developerJson} "developer", json_agg(tgs) "tags" FROM templates t INNER JOIN categories c ON t.category_id = c.category_id INNER JOIN users d ON t.developer_id = d.user_id JOIN tags tgs ON tgs.tag_id = ANY(t.tags_id) ${condition} ${groupBy}`;
Can anyone help me?
I have found the solution. I was passing the tag_id in the group elements.
Once I removed it, I got what I was expecting
const developerJson = `json_build_object( 'first_name',first_name, 'last_name', last_name, 'avatar_link', avatar_link, 'slug', d.slug ,'date_joined',date_joined)`;
const groupBy = `GROUP BY t.template_id, c.*, d.first_name, d.last_name, d.avatar_link, d.slug, d.date_joined`;
let query = `SELECT t.*, to_json(c) "category", ${developerJson} "developer", json_agg(tgs) "tags" FROM templates t INNER JOIN categories c ON t.category_id = c.category_id JOIN users d ON t.developer_id = d.user_id JOIN tags tgs ON tgs.tag_id = ANY(t.tags_id) ${groupBy}`;

Get column values from mapping tables "id | value" binding

I am trying to get all the columns associated to with my item, some columns are "key | value" paired and that's where my problem is. My idea for a structure looks like this
I can retrieve 1 item from Posts along with all associated tag names with this query, but the problem is that I just can get 1 post
SELECT TOP(10)
bm.title, bm.post_id,
a.name AS tag1, b.name AS tag2, c.name AS tag3, d.name AS tag4
FROM
Posts AS bm
INNER JOIN
Tagmap AS tm
INNER JOIN
Tag AS a ON a.tag_id = tm.tag_id1
INNER JOIN
Tag AS b ON b.tag_id = tm.tag_id2
INNER JOIN
Tag AS c ON c.tag_id = tm.tag_id3
INNER JOIN
Tag AS d ON d.tag_id = tm.tag_id4
ON bm.post_id = tm.post_id
Here is the DDL for the table, or you can get it from this PasteBin link:
CREATE TABLE Tag
(
tag_id int NOT NULL identity(0,1) primary key,
name nvarchar(30) NOT NULL,
);
CREATE TABLE Tagmap
(
id int NOT NULL identity(0,1) primary key,
post_id int FOREIGN KEY REFERENCES Posts(post_id),
tag_id1 int FOREIGN KEY REFERENCES Tag(tag_id),
tag_id2 int FOREIGN KEY REFERENCES Tag(tag_id),
tag_id3 int FOREIGN KEY REFERENCES Tag(tag_id),
tag_id4 int FOREIGN KEY REFERENCES Tag(tag_id)
);
CREATE TABLE Posts
(
post_id int NOT NULL identity(0,1) primary key,
title nvarchar(50) not null,
);
INSERT INTO Posts VALUES ('Title1');
INSERT INTO Posts VALUES ('Title2');
INSERT INTO Tag VALUES ('Tag number one');
INSERT INTO Tag VALUES ('Tag number two');
INSERT INTO Tag VALUES ('Tag number three');
INSERT INTO Tag VALUES ('Tag number four');
INSERT INTO Tagmap VALUES (0, 0, 1, 2, 3);
My question: is my approach totally off? Should I change the structure or is it good?
If so how can it be better and how can I retrieve all these "key | value" columns along with my posts?
First, you should fix your data structure, so you have one row in tagMap per post_id and tag_id -- not four!
But event with your current structure, I imagine that not all posts have four tags. So, with your current data model you should be using LEFT JOIN, rather than INNER JOIN.

How do I get average through ‏multiple tables

I got homework to get ‏average tags of user in album (user_id = x) in the folowing tabels:
>>> CREATE TABLE USERS (ID INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
NAME TEXT NOT NULL);
>>> CREATE TABLE ALBUMS (ID INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
NAME TEXT NOT NULL, CREATION_DATE TEXT NOT NULL,
USER_ID INTEGER REFERENCES USERS(USER_ID) NOT NULL);
>>> CREATE TABLE PICTURES (ID INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
NAME TEXT NOT NULL,
LOCATION TEXT NOT NULL,
CREATION_DATE TEXT NOT NULL,
ALBUM_ID INTEGER REFERENCES ALBUMS(ALBUM_ID) NOT NULL);
>>> CREATE TABLE TAGS (ID INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
PICTURE_ID INTEGER REFERENCES PICTURES(PICTURE_ID) NOT NULL,
USER_ID INTEGER REFERENCES USERS(USER_ID) NOT NULL);";
explenetion:
Each tag is a row in TAGS and it has picture_id, each picture has album_id and each album has user_id, basically i need to count how many times the user is tagged in each album and find the average times that the user is tagged in an album.
I can use this using only: SELECT ? FROM, AVG(), COUNT(), JOIN (INNER, LEFT, RIGHT, FULL JOIN), ON, IN, AND, OR, LIKE, , NOT, (=, != , >, <), IS, DISTINCT, ORDER BY(ASC/DESC), LIMT, OFFSET, and WHERE that means i cannot use GROUP BY
i tried this
SELECT * FROM TAGS INNER JOIN PICTURES ON tags.picture_id = PICTURES.Id where album_id IN (select id from ALBUMS where user_id = x) AND user_id = x;
but it only gives my a table that has all the tags of the user
How can i get the avg tags per album of (user_id = x), is this even possible?
First count how many times the user is tagged in each album and then get the average of these counters:
select
avg(counter) averagetags
from (
select count(t.user_id) counter
from albums a
inner join pictures p on p.album_id = a.id
inner join tags t on t.picture_id = p.id
where t.user_id = ?
group by a.id
)

Select a product that is on all interventions

Hello my question is simple for some of yours ^^
I've a table product, reference, and intervention. When there is an intervention the table reference make the link between products that we need for the interventions and the intervention.
I would like to know how to do to search products that have made part of all interventions.
This are my tables :
--TABLE products
create table products (
reference char(5) not null check ( reference like 'DT___'),
designation char(50) not null,
price numeric (9,2) not null,
primary key(reference) );
-- TABLE interventions
create table interventions (
nointerv integer not null ,
dateinterv date not null,
nameresponsable char(30) not null,
nameinterv char(30) not null,
time float not null check ( temps !=0 AND temps between 0 and 8),
nocustomers integer not null ,
nofact integer not null ,
primary key( nointerv),
foreign key( noclient) references customers,
foreign key (nofacture) references facts
);
-- TABLE replacements
create table replacements (
reference char(5) not null check ( reference like 'DT%'),
nointerv integer not null,
qtereplaced smallint,
primary key ( reference, nointerv ),
foreign key (reference) references products,
foreign key(nointerv) references interventions(nointerv)
);
--EDIT :
This is a select from my replacement table
We can see in this picture that the product DT802 is used in every interventions
Thanks ;)
This will show 1 line intervention - products. Is this you are expecting for?
select interventions.nointerv, products.reference
from interventions
inner join replacements on interventions.nointerv = replacements.nointerv
inner join products on replacements.reference = products.reference;
This one?
select products.reference, products.designation
from interventions
inner join replacements on interventions.nointerv = replacements.nointerv
inner join products on replacements.reference = products.reference
group by products.reference, products.designation
having count(*) = (select count(*) from interventions);
Your question is hard to follow. If I interpret it as all nointerv in replacements whose reference contains all products, then:
select nointerv
from replacements r
group by nointerv
having count(distinct reference) = (select count(*) from products);

What is the most efficient way of joining tables of different dimensions?

I have the following schema:
CREATE TABLE products (
id BIGSERIAL NOT NULL,
created_at_timestamp TIMESTAMP NOT NULL DEFAULT NOW(),
last_update_timestamp TIMESTAMP NOT NULL DEFAULT NOW(),
PRIMARY KEY (id)
);
CREATE TABLE product_names (
product_id BIGINT NOT NULL,
language TEXT NOT NULL,
name TEXT NOT NULL,
PRIMARY KEY (product_id, language),
FOREIGN KEY (product_id) REFERENCES products (id)
);
CREATE TABLE product_summaries (
product_id BIGINT NOT NULL,
language TEXT NOT NULL,
summary TEXT NOT NULL,
PRIMARY KEY (product_id, language),
FOREIGN KEY (product_id) REFERENCES products (id)
);
And I want to select all Products.
However as you can see a Product contains a list of names and summaries (per language).
I can retrieve all Products
SELECT * FROM products
And then iterate all the rows (in this case in Kotlin), and then request the names and summaries:
SELECT * FROM product_names WHERE product_id = $id
And
SELECT * FROM product_summaries WHERE product_id = $id
However, this seems inefficient, since I am making 3 separate queries to the database.
I though of using JOINs to get all of this with one query, but then I get multiple repeated rows for each product_names and product_summaries entry.
So in the end, is there a better way of requesting all this data in one query?
You definitely don't want to do multiple queries and then iterate over them in the code. That's horribly inefficient. When you do the second JOIN, you need to include language in the JOIN. That should keep you from getting duplicate rows. This should give you one row for each unique combination of [products.id, product_names.language]
SELECT
products.id
,products.created_at_timestamp
,products.last_update_timestamp
,product_names.name
,product_summaries.summary
,product_names.language
FROM
products
INNER JOIN
product_names ON product_names.product_id = products.id
INNER JOIN
product_summaries ON product_summaries.product_id = products.id
AND product_summaries.language = product_names.language
I've found a way of doing it:
SELECT * FROM products as p INNER JOIN
(SELECT json_agg(product_names) as names, product_id FROM product_names GROUP BY product_id) as tb_names ON tb_names.product_id = p.id
INNER JOIN
(SELECT json_agg(product_summaries) as summaries, product_id FROM product_summaries GROUP BY product_id) as tb_summaries ON tb_summaries.product_id = p.id
returns:
1 | 2018-07-20 09:36:21.56904 | 2018-07-20 09:36:21.56904 | [{"product_id":1,"language":"EN","name":"lol"},
{"product_id":1,"language":"DE","name":"lel"}] | 1 [{"product_id":1,"language":"EN","summary":"deded"},
{"product_id":1,"language":"DE","summary":"rererere"},
{"product_id":1,"language":"FR","summary":"jejejeje"}] | 1
Basically I'm converting the multi-dimensional tables to JSON :)
Postgres is amazing!