What is the most efficient way of joining tables of different dimensions? - sql

I have the following schema:
CREATE TABLE products (
id BIGSERIAL NOT NULL,
created_at_timestamp TIMESTAMP NOT NULL DEFAULT NOW(),
last_update_timestamp TIMESTAMP NOT NULL DEFAULT NOW(),
PRIMARY KEY (id)
);
CREATE TABLE product_names (
product_id BIGINT NOT NULL,
language TEXT NOT NULL,
name TEXT NOT NULL,
PRIMARY KEY (product_id, language),
FOREIGN KEY (product_id) REFERENCES products (id)
);
CREATE TABLE product_summaries (
product_id BIGINT NOT NULL,
language TEXT NOT NULL,
summary TEXT NOT NULL,
PRIMARY KEY (product_id, language),
FOREIGN KEY (product_id) REFERENCES products (id)
);
And I want to select all Products.
However as you can see a Product contains a list of names and summaries (per language).
I can retrieve all Products
SELECT * FROM products
And then iterate all the rows (in this case in Kotlin), and then request the names and summaries:
SELECT * FROM product_names WHERE product_id = $id
And
SELECT * FROM product_summaries WHERE product_id = $id
However, this seems inefficient, since I am making 3 separate queries to the database.
I though of using JOINs to get all of this with one query, but then I get multiple repeated rows for each product_names and product_summaries entry.
So in the end, is there a better way of requesting all this data in one query?

You definitely don't want to do multiple queries and then iterate over them in the code. That's horribly inefficient. When you do the second JOIN, you need to include language in the JOIN. That should keep you from getting duplicate rows. This should give you one row for each unique combination of [products.id, product_names.language]
SELECT
products.id
,products.created_at_timestamp
,products.last_update_timestamp
,product_names.name
,product_summaries.summary
,product_names.language
FROM
products
INNER JOIN
product_names ON product_names.product_id = products.id
INNER JOIN
product_summaries ON product_summaries.product_id = products.id
AND product_summaries.language = product_names.language

I've found a way of doing it:
SELECT * FROM products as p INNER JOIN
(SELECT json_agg(product_names) as names, product_id FROM product_names GROUP BY product_id) as tb_names ON tb_names.product_id = p.id
INNER JOIN
(SELECT json_agg(product_summaries) as summaries, product_id FROM product_summaries GROUP BY product_id) as tb_summaries ON tb_summaries.product_id = p.id
returns:
1 | 2018-07-20 09:36:21.56904 | 2018-07-20 09:36:21.56904 | [{"product_id":1,"language":"EN","name":"lol"},
{"product_id":1,"language":"DE","name":"lel"}] | 1 [{"product_id":1,"language":"EN","summary":"deded"},
{"product_id":1,"language":"DE","summary":"rererere"},
{"product_id":1,"language":"FR","summary":"jejejeje"}] | 1
Basically I'm converting the multi-dimensional tables to JSON :)
Postgres is amazing!

Related

Posgresql join two tables where the foriegn key is an array of ids

I am new to SQL and I have three table
Templates Table
CREATE TABLE templates (
template_id serial PRIMARY KEY,
template_name VARCHAR ( 15 ) UNIQUE NOT NULL,
FOREIGN KEY (developer_id) REFERENCES users(user_id),
FOREIGN KEY (category_id) REFERENCES categories(category_id),
tag_ids int[],
FOREIGN KEY (EACH ELEMENT OF tag_ids) REFERENCES tags(tag_id)
);
Categories Table
CREATE TABLE categories (
category_id serial PRIMARY KEY,
category_name VARCHAR ( 15 ) UNIQUE NOT NULL
);
Tags Table
CREATE TABLE tags (
tag_id serial PRIMARY KEY,
tag_name VARCHAR ( 100 ) NOT NULL,
);
I want to Select all templates where each template has a category object and a tags object.
Each template has one category but may have multiple tags.
I want to have the tags as an array attribute in the template object
I have tried this query, it does what i want but it creates multiple objects for the same template. So it simply creates n objects where n is the number of tags.
let query = `SELECT t.*, to_json(c) "category", ${developerJson} "developer", json_agg(tgs) "tags" FROM templates t INNER JOIN categories c ON t.category_id = c.category_id INNER JOIN users d ON t.developer_id = d.user_id JOIN tags tgs ON tgs.tag_id = ANY(t.tags_id) ${condition} ${groupBy}`;
Can anyone help me?
I have found the solution. I was passing the tag_id in the group elements.
Once I removed it, I got what I was expecting
const developerJson = `json_build_object( 'first_name',first_name, 'last_name', last_name, 'avatar_link', avatar_link, 'slug', d.slug ,'date_joined',date_joined)`;
const groupBy = `GROUP BY t.template_id, c.*, d.first_name, d.last_name, d.avatar_link, d.slug, d.date_joined`;
let query = `SELECT t.*, to_json(c) "category", ${developerJson} "developer", json_agg(tgs) "tags" FROM templates t INNER JOIN categories c ON t.category_id = c.category_id JOIN users d ON t.developer_id = d.user_id JOIN tags tgs ON tgs.tag_id = ANY(t.tags_id) ${groupBy}`;

PostgreSQL can't aggregate data from many tables

For simplicity, I will write the minimum number of fields in the tables.
Suppose I have this tables: items, item_photos, items_characteristics.
create table items (
id bigserial primary key,
title jsonb not null,
);
create table item_photos (
id bigserial primary key,
path varchar(1000) not null,
item_id bigint references items (id) not null,
sort_order smallint not null,
unique (path, item_id)
);
create table items_characteristics (
item_id bigint references items (id),
characteristic_id bigint references characteristics (id),
characteristic_option_id bigint references characteristic_options (id),
numeric_value numeric(19, 2),
primary key (item_id, characteristic_id),
unique (item_id, characteristic_id, characteristic_option_id));
And I want to aggregate all the photos and characteristics of one item.
For a start, I got this.
select i.id as id,
i.title as title,
array_agg( ip.path) as photos,
array_agg(
array [ico.characteristic_id, ico.characteristic_option_id, ico.numeric_value]) as characteristics_array
FROM items i
LEFT JOIN item_photos ip on i.id = ip.item_id
LEFT JOIN items_characteristics ico on ico.item_id = i.id
GROUP BY i.id
The first problem here arises in the fact that if there are 4 entries in item_characteristics that relate to one item, and, for example, item_photos did not have entries, I get an array of four null elements in the photos field {null, null, null, null}.
So I had to use array_remove:
array_remove(array_agg(ip.path), null) as photos
Further, if I have 1 photo and 4 characteristics, I get a duplicate of 4 photo entries, for example: {img/test-img-1.png,img/test-img-1.png,img/test-img-1.png,img/test-img-1.png}
So I had to use distinct:
array_remove(array_agg(distinct ip.path), null) as photos,
array_agg(distinct
array [ico.characteristic_id, ico.characteristic_option_id, ico.numeric_value]) as characteristics_array
The decision is rather awkward as for me.
The situation is complicated by the fact that I had to add 2 more fields to item_characteristics:
string_value jsonb, --string value
json_value jsonb --custom value
And so I need to aggregate already 5 values ​​from item_characteristics, where 2 are already jsonb and distinct can have a very negative impact on performance.
Is there any more elegant solution?
Aggregate before joining:
SELECT i.id as id, i.title as title, ip.paths, null as photos,
ico.characteristics_array
FROM items i LEFT JOIN
(SELECT ip.item_id, array_agg( ip.path) as paths
FROM item_photos ip
GROUP BY ip.item_ID
) ip
ON ip.id = i.item_id LEFT JOIN
(SELECT ico.item_id,
array_agg(array [ico.characteristic_id, ico.characteristic_option_id, ico.numeric_value]
) as characteristics_array
FROM items_characteristics ico
GROUP BY ico.item_id
) ico
ON ico.item_id = i.id

How to relate tables SQL

I have three tables and i want to relate them, but i don't know what im doing wrong. If the way that im thinking is bad, can you correct me also?
I have clients table with Primary key as ID_c column,
create table clients
(
id_c INTEGER not null,
name VARCHAR2(20),
age INTEGER,
address VARCHAR2(20),
Primary key (id_c)
);
also i have products with primary key as ID_p column.
create table PRODUCTS
(
id_p NUMBER not null,
name_product VARCHAR2(30),
price NUMBER,
duration NUMBER,
primary key (id_p)
);
and now i create third
create table TRANSACTIONS
(
id_t NUMBER not null,
id_c NUMBER not null,
id_p NUMBER not null
primary key (ID_t),
foreign key (ID_c) references CLIENTS (ID_c),
foreign key (ID_p) references PRODUCTS (ID_p)
);
and now i want to see all records that are connected, so im trying to use that:
select * from transactions join clients using (id_c) and join products using (id_p);
but only what works is
select * from transactions join clients using (id_c);
is it relational database or im making something too easy, and too primitive? How can i do that to connect everything?
try this
select *
from transactions
inner join clients on transactions.id_c = clients.id_c
inner join products on transactions.id_p = products.id_p;
Are you just trying to join?
select * from transactions a
join clients b on a.id_c = b.id_c
join products c on a.id_p = c.id_p
If you want to join 3 tables, just write:
SELECT * FROM TRANSACTIONS t JOIN client c on t.id_c = c.id_c JOIN PRODUCTS p on t.id_p = p.id_p

NOT EXIST clause

I am trying to find Products that have never been ordered. My 2 tables look like this.
CREATE TABLE Orders
(OrderNum NUMBER(10) NOT NULL,
OrderDate DATE NOT NULL,
Cust NUMBER(10),
Rep NUMBER(10),
Mfr CHAR(3) NOT NULL,
Product CHAR(5) NOT NULL,
Qty NUMBER(5) NOT NULL,
Amount NUMBER(9,2) NOT NULL,
CONSTRAINT OrdersPK
PRIMARY KEY (OrderNum));
CREATE TABLE Products
(Mfr CHAR(3) NOT NULL,
Product CHAR(5) NOT NULL,
Description VARCHAR2(20) NOT NULL,
Price NUMBER(9,2) NOT NULL,
QtyOnHand NUMBER(5),
CONSTRAINT ProductsPK
PRIMARY KEY (Mfr, Product));
The code I currently have looks like this.
SELECT Mfr, Product
FROM Products
WHERE NOT EXISTS (SELECT Products.Mfr
FROM Orders, Products
WHERE Orders.Mfr = Products.Mfr);
Although I am not getting any errors there are also no results showing up.
**EDIT: There are 26 Products and 19 of them have been ordered. I am expecting to get 7 Results but I am getting 0.
You can use NOT EXISTS, but you need to compare both keys:
SELECT p.Mfr, p.Product
FROM Products p
WHERE NOT EXISTS (SELECT 1
FROM Orders o
WHERE o.Mfr = p.Mfr AND
o.Product = p.Product
);
This is a case where it makes lots of sense to have an auto generated primary key that can be used for foreign key relationships.
Try this one
SELECT Mfr, Product
FROM Products
WHERE NOT EXISTS (SELECT Orders.Mfr
FROM Orders
WHERE Orders.Mfr = Products.Mfr AND Orders.Product = Products.Product);
An alternative is to use the set operation operator EXCEPT - as you want "the set of Products that don't exist in Orders":
SELECT
Mfr,
Product
FROM
Products
EXCEPT
SELECT
DISTINCT
Mfr,
Product
FROM
Orders
You can then use this as a subquery to get full product information.
SELECT
*
FROM
Products
INNER JOIN (
SELECT
Mfr,
Product
FROM
Products
EXCEPT
SELECT
DISTINCT
Mfr,
Product
FROM
Orders
) AS ProductsWithNoOrders ON
Products.Mfr = ProductsWithNoOrders.Mfr AND
Products.Product = ProductsWithNoOrders.Product

Select newest entry from a joined MySQL table

I have stock quantity information in my database.
1 table, "stock", holds the productid (sku) along with the quantity and the filename from where it came.
The other table, "stockfile", contains all the processed filenames along with dates.
Now I need to get all the products with their latest stock quantity values.
This gives me ALL the products multiple times with all their stock quantity (resulting in 300.000 records)
SELECT stock.stockid, stock.sku, stock.quantity, stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
ORDER BY stock.sku ASC
I already tried this:
SELECT * FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
GROUP BY sku
HAVING stockfile.date = MAX( stockfile.date )
ORDER BY stock.sku ASC
But it did not work
SHOW CREATE TABLE stock:
CREATE TABLE stock (
stockid bigint(20) NOT NULL AUTO_INCREMENT,
sku char(25) NOT NULL,
quantity int(5) NOT NULL,
creationdate datetime NOT NULL,
stockfileid smallint(5) unsigned NOT NULL,
touchdate datetime NOT NULL,
PRIMARY KEY (stockid)
) ENGINE=MyISAM AUTO_INCREMENT=315169 DEFAULT CHARSET=latin1
SHOW CREATE TABLE stockfile:
CREATE TABLE stockfile (
stockfileid smallint(5) unsigned NOT NULL AUTO_INCREMENT,
filename varchar(25) NOT NULL,
creationdate datetime DEFAULT NULL,
touchdate datetime DEFAULT NULL,
date datetime DEFAULT NULL,
begindate datetime DEFAULT NULL,
enddate datetime DEFAULT NULL,
PRIMARY KEY (stockfileid)
) ENGINE=MyISAM AUTO_INCREMENT=265 DEFAULT CHARSET=latin1
This is an example of the frequently-asked "greatest-n-per-group" question that we see every week on StackOverflow. Follow that tag to see other similar solutions.
SELECT s.*, f1.*
FROM stock s
INNER JOIN stockfile f1
ON (s.stockfileid = f1.stockfileid)
LEFT OUTER JOIN stockfile f2
ON (s.stockfileid = f2.stockfileid AND f1.date < f2.date)
WHERE f2.stockfileid IS NULL;
If there are multiple rows in stockfile that have the max date, you'll get them both in the result set. To resolve this, you'd have to add some tie-breaker conditions into the join on f2.
Thanks for adding the CREATE TABLE info. That's very helpful when you're asking SQL questions.
I see from the AUTO_INCREMENT table options that you have 315k rows in stock and only 265 rows in stockfile. Your stockfile table is the parent in the relationship, and the stock table is the child, with a column stockfileid that references the primary key of stockfile.
So your original question was misleading. You want the latest row from stock, not the latest row from stockfile.
SELECT f.*, s1.*
FROM stockfile f
INNER JOIN stock s1
ON (f.stockfileid = s1.stockfileid)
LEFT OUTER JOIN stock s2
ON (f.stockfileid = s2.stockfileid AND (s1.touchdate < s2.touchdate
OR s1.touchdate = s2.touchdate AND s1.stockid < s2.stockid))
WHERE s2.stockid IS NULL;
I'm assuming you want "latest" to be relative to touchdate, so if you want to use creationdate instead, you can do the edit.
I've added a term to the join so that it resolves ties. I know you said the dates are "practically unique" but as the saying goes, "one in a million is next Tuesday."
Okay, I think I understand what you're trying to do now. You want the most recent row per sku, but the date by which to compare them is in the referenced table stockfile.
SELECT s1.*, f1.*
FROM stock s1
JOIN stockfile f1 ON (s1.stockfileid = f1.stockfileid)
LEFT OUTER JOIN (stock s2 JOIN stockfile f2 ON (s2.stockfileid = f2.stockfileid))
ON (s1.sku = s2.sku AND (f1.date < f2.date OR f1.date = f2.date AND f1.stockfileid < f2.stockfileid))
WHERE s2.sku IS NULL;
This does a self-join of stock to itself, looking for a row with the same sku and a more recent date. When none is found, then s1 contains the most recent row for its sku. And each instance of stock has to join to its stockfile to get the date.
Re comment about optimization: It's hard for me to test because I don't have tables populated with data matching yours, but I'd guess you should have the following indexes:
CREATE INDEX stock_sku ON stock(sku);
CREATE INDEX stock_stockfileid ON stock(stockfileid);
CREATE INDEX stockfile_date ON stockfile(date);
I'd suggest using EXPLAIN to analyze the query without the indexes, and then create one index at a time and re-analyze with EXPLAIN to see which one gives the most direct benefit.
Use:
SELECT DISTINCT s.stockid,
s.sku,
s.quantity,
sf.filename,
sf.date
FROM STOCK s
JOIN STOCKFILE sf ON sf.stockfileid = s.stockfileid
JOIN (SELECT t.stockfileid,
MAX(t.date) 'max_date'
FROM STOCKFILE t
GROUP BY t.stockfileid) x ON x.stockfileid = sf.stockfileid
AND x.max_date = sf.date
select *
from stock
where stockfileid in (
select top 1 stockfileid
from stockfile
order by date desc
)
There are two common ways to accomplish this: a sub query or a self-join.
See this example of selecting the group-wise maximum at the MySQL site.
Edit, an example using a subquery:
SELECT stock.stockid, stock.sku, stock.quantity,
stockfile.filename, stockfile.date
FROM stock
INNER JOIN stockfile ON stock.stockfileid = stockfile.stockfileid
WHERE stockfile.date = (SELECT MAX(date) FROM stockfile);