SQL Ordering a complex query - sql

Suppose I have these tables:
items, which stores items.
CREATE TABLE items
(
id serial NOT NULL,
name character varying(255),
rarity character varying(255),
created_at timestamp without time zone,
updated_at timestamp without time zone,
CONSTRAINT items_pkey PRIMARY KEY (id)
)
item_modifiers, which manages the many-to-many relationship between items and modifiers
CREATE TABLE item_modifiers
(
id serial NOT NULL,
item_id integer,
modifier_id integer,
primary_value integer,
secondary_value integer,
CONSTRAINT item_modifiers_pkey PRIMARY KEY (id)
)
modifiers , which contains all possible item modifiers
CREATE TABLE modifiers
(
id serial NOT NULL,
name character varying(255),
CONSTRAINT explicit_mods_pkey PRIMARY KEY (id)
)
Now suppose I have a complex query. I want to find all items that have modifiers with ID 1 and 2, ordered by the primary_value of the modifier with ID 1.
I have tried this query
SELECT "items".*, item_modifiers.primary_value FROM "items"
INNER JOIN item_modifiers ON item_modifiers.item_id = items.id
AND ((item_modifiers.modifier_id = 1)
OR (item_modifiers.modifier_id = 2))
GROUP BY items.id, item_modifiers.primary_value
HAVING
COUNT(item_modifiers.id) = 2
ORDER BY item_modifiers.primary_value DESC
But it returns an empty result set. However when I don't group by the primary_value, it does, but then I can't order it. I've been stuck on this for ages, so any help is greatly appreciated.
EDIT I have built an SQL fiddle to demonstrate
http://sqlfiddle.com/#!15/30887/1

You don't have to join with item_modifiers with the modifier_id 2, you only want it guaranteed that such a records EXISTS:
SELECT items.*, item_modifiers.primary_value
FROM items
INNER JOIN item_modifiers ON item_modifiers.item_id = items.id AND item_modifiers.modifier_id = 1
WHERE EXISTS
(
SELECT *
FROM item_modifiers
WHERE item_modifiers.item_id = items.id AND item_modifiers.modifier_id = 2
)
ORDER BY item_modifiers.primary_value DESC;
Here is your SQL fiddle: http://sqlfiddle.com/#!15/30887/9

Is this what you are looking for :
SELECT "items".*,
item_modifiers.primary_value
FROM "items"
INNER JOIN item_modifiers
ON item_modifiers.item_id = items.id
AND ( ( item_modifiers.modifier_id = 1 )
OR ( item_modifiers.modifier_id = 2 ) )
ORDER BY item_modifiers.primary_value DESC;

Related

PostgreSQL can't aggregate data from many tables

For simplicity, I will write the minimum number of fields in the tables.
Suppose I have this tables: items, item_photos, items_characteristics.
create table items (
id bigserial primary key,
title jsonb not null,
);
create table item_photos (
id bigserial primary key,
path varchar(1000) not null,
item_id bigint references items (id) not null,
sort_order smallint not null,
unique (path, item_id)
);
create table items_characteristics (
item_id bigint references items (id),
characteristic_id bigint references characteristics (id),
characteristic_option_id bigint references characteristic_options (id),
numeric_value numeric(19, 2),
primary key (item_id, characteristic_id),
unique (item_id, characteristic_id, characteristic_option_id));
And I want to aggregate all the photos and characteristics of one item.
For a start, I got this.
select i.id as id,
i.title as title,
array_agg( ip.path) as photos,
array_agg(
array [ico.characteristic_id, ico.characteristic_option_id, ico.numeric_value]) as characteristics_array
FROM items i
LEFT JOIN item_photos ip on i.id = ip.item_id
LEFT JOIN items_characteristics ico on ico.item_id = i.id
GROUP BY i.id
The first problem here arises in the fact that if there are 4 entries in item_characteristics that relate to one item, and, for example, item_photos did not have entries, I get an array of four null elements in the photos field {null, null, null, null}.
So I had to use array_remove:
array_remove(array_agg(ip.path), null) as photos
Further, if I have 1 photo and 4 characteristics, I get a duplicate of 4 photo entries, for example: {img/test-img-1.png,img/test-img-1.png,img/test-img-1.png,img/test-img-1.png}
So I had to use distinct:
array_remove(array_agg(distinct ip.path), null) as photos,
array_agg(distinct
array [ico.characteristic_id, ico.characteristic_option_id, ico.numeric_value]) as characteristics_array
The decision is rather awkward as for me.
The situation is complicated by the fact that I had to add 2 more fields to item_characteristics:
string_value jsonb, --string value
json_value jsonb --custom value
And so I need to aggregate already 5 values ​​from item_characteristics, where 2 are already jsonb and distinct can have a very negative impact on performance.
Is there any more elegant solution?
Aggregate before joining:
SELECT i.id as id, i.title as title, ip.paths, null as photos,
ico.characteristics_array
FROM items i LEFT JOIN
(SELECT ip.item_id, array_agg( ip.path) as paths
FROM item_photos ip
GROUP BY ip.item_ID
) ip
ON ip.id = i.item_id LEFT JOIN
(SELECT ico.item_id,
array_agg(array [ico.characteristic_id, ico.characteristic_option_id, ico.numeric_value]
) as characteristics_array
FROM items_characteristics ico
GROUP BY ico.item_id
) ico
ON ico.item_id = i.id

What is the most efficient way of joining tables of different dimensions?

I have the following schema:
CREATE TABLE products (
id BIGSERIAL NOT NULL,
created_at_timestamp TIMESTAMP NOT NULL DEFAULT NOW(),
last_update_timestamp TIMESTAMP NOT NULL DEFAULT NOW(),
PRIMARY KEY (id)
);
CREATE TABLE product_names (
product_id BIGINT NOT NULL,
language TEXT NOT NULL,
name TEXT NOT NULL,
PRIMARY KEY (product_id, language),
FOREIGN KEY (product_id) REFERENCES products (id)
);
CREATE TABLE product_summaries (
product_id BIGINT NOT NULL,
language TEXT NOT NULL,
summary TEXT NOT NULL,
PRIMARY KEY (product_id, language),
FOREIGN KEY (product_id) REFERENCES products (id)
);
And I want to select all Products.
However as you can see a Product contains a list of names and summaries (per language).
I can retrieve all Products
SELECT * FROM products
And then iterate all the rows (in this case in Kotlin), and then request the names and summaries:
SELECT * FROM product_names WHERE product_id = $id
And
SELECT * FROM product_summaries WHERE product_id = $id
However, this seems inefficient, since I am making 3 separate queries to the database.
I though of using JOINs to get all of this with one query, but then I get multiple repeated rows for each product_names and product_summaries entry.
So in the end, is there a better way of requesting all this data in one query?
You definitely don't want to do multiple queries and then iterate over them in the code. That's horribly inefficient. When you do the second JOIN, you need to include language in the JOIN. That should keep you from getting duplicate rows. This should give you one row for each unique combination of [products.id, product_names.language]
SELECT
products.id
,products.created_at_timestamp
,products.last_update_timestamp
,product_names.name
,product_summaries.summary
,product_names.language
FROM
products
INNER JOIN
product_names ON product_names.product_id = products.id
INNER JOIN
product_summaries ON product_summaries.product_id = products.id
AND product_summaries.language = product_names.language
I've found a way of doing it:
SELECT * FROM products as p INNER JOIN
(SELECT json_agg(product_names) as names, product_id FROM product_names GROUP BY product_id) as tb_names ON tb_names.product_id = p.id
INNER JOIN
(SELECT json_agg(product_summaries) as summaries, product_id FROM product_summaries GROUP BY product_id) as tb_summaries ON tb_summaries.product_id = p.id
returns:
1 | 2018-07-20 09:36:21.56904 | 2018-07-20 09:36:21.56904 | [{"product_id":1,"language":"EN","name":"lol"},
{"product_id":1,"language":"DE","name":"lel"}] | 1 [{"product_id":1,"language":"EN","summary":"deded"},
{"product_id":1,"language":"DE","summary":"rererere"},
{"product_id":1,"language":"FR","summary":"jejejeje"}] | 1
Basically I'm converting the multi-dimensional tables to JSON :)
Postgres is amazing!

postgres: (sub)select and combine optional content into an array

i have the following table structure:
Location----- * Media ----1 Attribute --------* AttributeTranslation
Each Location has n mediaitems attached, containing one optional attribute (text) and n associated translationa for that attribute.
I need to select this data into an array, so that i get for each location the associated medialist for each language.
what i currently do and what i get:
SELECT m.location_id, t.language_id,
array_agg_mult(
ARRAY[ARRAY[m.sortorder::text, m.filename, t.name]] ORDER BY m.sortorder
) as medialist
FROM Media m
LEFT JOIN ATTRIBUTE a ON a.id = m.attribute_id
LEFT JOIN AttributeTranslation t ON a.id = t.attribute_id
WHERE m.location_id = ?
GROUP BY m.location_id, t.language_id
This gives me following result for the given scenario: the current location has 4 images attached, only the first image has an associated attribute containing two translations:
Location_ID Language_ID MEDIALIST
AT_014 1 {{1,'location_image1.jpg','attribute german'}}
AT_014 2 {{1,'location_image1.jpg','attribute english'}}
AT_014 {{2,'location_image2.jpg',null},{3,'location_image3.jpg',null},{4,'location_image4.jpg',null}}
but what i need instead is this:
Location_ID Language_ID MEDIALIST
AT_014 1 {{1,'location_image1.jpg','attribute german'},{2,'location_image2.jpg',null},{3,'location_image3.jpg',null},{4,'location_image4.jpg',null}}
AT_014 2 {{1,'location_image1.jpg','attribute english'},{2,'location_image2.jpg',null},{3,'location_image3.jpg',null},{4,'location_image4.jpg',null}}
those 3 columns are part of a view, so that i can do later:
select * from locationview where location_id = ? and language_id = ?
how can i achieve the desired result here? thanks in advance!
Simplified Table Definitions:
CREATE TABLE LOCATION (
location_id numeric(20) primary key,
description text
);
CREATE TABLE MEDIA (
media_id numeric(20) primary key,
fileName text,
sortorder smallint,
location_id numeric(20) references LOCATION(location_id),
attribute_id numeric(20) references ATTRIBUTE(attribute_id)
);
CREATE TABLE ATTRIBUTE (
attribute_id numeric(20) primary key,
attributetype varchar(100),
);
CREATE TABLE ATTRIBUTETRANSLATION (
translation_id numeric(20),
language_id smallint,
name text,
description text,
attribute_id numeric(20) references ATTRIBUTE(attribute_id)
);
ALTER TABLE ATTRIBUTETRANSLATION add constraint AT_ID primary key(translation_id, language_id)
I am not sure I fully understand your question, but here's an attempt. You could take the output of your query, and match each row that has a language_id with the corresponding rows where language_id is NULL, so that you can then concatenate the medialist arrays. Here's a way to do that by creating an alias of your query with a CTE:
WITH t AS (
SELECT m.location_id, t.language_id,
array_agg(
ARRAY[ARRAY[m.sortorder::text, m.filename, t.name]] ORDER BY m.sortorder
) as medialist
FROM Media m
LEFT JOIN ATTRIBUTE a ON a.attribute_id = m.attribute_id
LEFT JOIN AttributeTranslation t ON a.attribute_id = t.attribute_id
WHERE m.location_id = ?
GROUP BY m.location_id, t.language_id
)
SELECT location_id, t1.language_id, t1.medialist || t2.medialist AS medialist
FROM (SELECT * FROM t WHERE language_id IS NOT NULL) t1
RIGHT OUTER JOIN (SELECT * FROM t WHERE language_id IS NULL) t2 USING (location_id);
I am not sure if this does exactly what you want, but hopefully it will give you some ideas.

Optimise many-to-many join

I have three tables: groups and people and groups_people which forms a many-to-many relationship between groups and people.
Schema:
CREATE TABLE groups (
id SERIAL PRIMARY KEY,
name TEXT
);
CREATE TABLE people (
id SERIAL PRIMARY KEY,
name TEXT,
join_date TIMESTAMP
);
CREATE TABLE groups_people (
group_id INT REFERENCES groups(id),
person_id INT REFERENCES people(id)
);
When I want to query for the latest 10 people who recenlty joined the group which has id = 1:
WITH person_ids AS (SELECT person_id FROM groups_people WHERE group_id = 1)
SELECT * FROM people WHERE id = ANY(SELECT person_id FROM person_ids)
ORDER BY join_date DESC LIMIT 10;
The query needs to scan all of the joined people then ordering them before selecting. That would be slow if the group containing too many people.
Is there anyway to work around it?
Schema (re-)design to allow same person joining multiple group
Since you mentioned that the relationship between groups and people
is many-to-many, I think you may want to move join_date to groups_people
(from people) because the same person can join different groups and each
such event has its own join_date
So I would change the schema to
CREATE TABLE people (
id SERIAL PRIMARY KEY,
name TEXT --, -- change
-- join_date TIMESTAMP -- delete
);
CREATE TABLE groups_people (
group_id INT REFERENCES groups(id),
person_id INT REFERENCES people(id), -- change
join_date TIMESTAMP -- add
);
Query
select
p.id
, p.name
, gp.join_date
from
people as p
, groups_people as gp
where
p.id = gp.person_id
and gp.group_id=1
order by gp.join_date desc
limit 10
Disclaimer: The above query is in MySQL syntax (the question was originally tagged with MySQL)
This seems much easier to write as a simple join with order by and limit:
select p.*
from people p join
groups_people gp
on p.id = gp.person_id
where gp.group_id = 1
order by gp.join_date desc
limit 10; -- or fetch first 10 rows only
Try rewriting using EXISTS
SELECT *
FROM people p
WHERE EXISTS (SELECT 1
FROM groups_people ps
WHERE p.id = ps.person_id and group_id = 1)
ORDER BY join_date DESC
LIMIT 10;

SELECT Statement for cocktail db

This is probably pretty simple and dumb to ask but Im just not getting there right now. I have a db for cocktails and want to check which cocktails I can make with the available ingredients:
Get the names of all cocktails where every ingredient is in stock
These are my tables:
create table cocktails
(
name TEXT PRIMARY KEY
)
create table ingredients
(
name TEXT PRIMARY KEY
)
create table cocktail_ingredients
(
cocktail_name TEXT ,
ingredient_name TEXT ,
amount INTEGER ,
FOREIGN KEY ( cocktail_name ) REFERENCES cocktails( name ) ,
FOREIGN KEY ( ingredient_name ) REFERENCES ingredients( name )
)
create table ingredients_in_stock
(
ingredient_name TEXT ,
FOREIGN KEY ( ingredient_name ) REFERENCES ingredients ( name )
)
And this is my code so far:
SELECT ci.cocktail_name
FROM cocktail_ingredients ci
WHERE ci.ingredient_name IN ( SELECT iis.ingredient_name
FROM ingredients_in_stock iis
)
GROUP BY ci.cocktail_name
HAVING COUNT(*) = ( SELECT COUNT(*)
FROM ingredients_in_stock
)
;
You can use a LEFT JOIN and a IN clause for this. Something like this:
SELECT name FROM cocktails WHERE Name NOT IN(
SELECT DISTINCT ci.cocktail_name FROM cocktail_ingredients ci LEFT JOIN ingredients_in_stock istk
ON ci.ingredient_name=istk.ingredient_name WHERE istk.ingredient_name IS NULL)
This query inverts the logic: List the cocktails where none of it's ingredients are missing on the ingredients_in_stock table. Hope the idea helps you
A correlated subquery should work:
select cocktail_name as all_ingredients_in_stock
from cocktail_ingredients ci
inner join ingredients_in_stock iis
on ci.ingredient_name = iis.ingredient_name
group by cocktail_name
having count(*) =
(select count(*)
from cocktail_ingredients
where cocktail_name = ci.cocktail_name
)
Sample SQL Fiddle
You could just say something like this:
select ci.name
from cocktail_ingredients ci
left join ingredients_in_stock iis on iis.ingredient_name = ci.ingredient_name
group by ci.name
having count(ci.ingredient_name) = sum( case
when iis.ingredient_name is not null
then 1
else 0
end
)
In the having clause,
The count(ci.ingredient_name) gives you the total number of ingredients required for the cocktail
The sum() expression gives you the count of in-stock ingredients used by the cocktail.