I am trying to get all the columns associated to with my item, some columns are "key | value" paired and that's where my problem is. My idea for a structure looks like this
I can retrieve 1 item from Posts along with all associated tag names with this query, but the problem is that I just can get 1 post
SELECT TOP(10)
bm.title, bm.post_id,
a.name AS tag1, b.name AS tag2, c.name AS tag3, d.name AS tag4
FROM
Posts AS bm
INNER JOIN
Tagmap AS tm
INNER JOIN
Tag AS a ON a.tag_id = tm.tag_id1
INNER JOIN
Tag AS b ON b.tag_id = tm.tag_id2
INNER JOIN
Tag AS c ON c.tag_id = tm.tag_id3
INNER JOIN
Tag AS d ON d.tag_id = tm.tag_id4
ON bm.post_id = tm.post_id
Here is the DDL for the table, or you can get it from this PasteBin link:
CREATE TABLE Tag
(
tag_id int NOT NULL identity(0,1) primary key,
name nvarchar(30) NOT NULL,
);
CREATE TABLE Tagmap
(
id int NOT NULL identity(0,1) primary key,
post_id int FOREIGN KEY REFERENCES Posts(post_id),
tag_id1 int FOREIGN KEY REFERENCES Tag(tag_id),
tag_id2 int FOREIGN KEY REFERENCES Tag(tag_id),
tag_id3 int FOREIGN KEY REFERENCES Tag(tag_id),
tag_id4 int FOREIGN KEY REFERENCES Tag(tag_id)
);
CREATE TABLE Posts
(
post_id int NOT NULL identity(0,1) primary key,
title nvarchar(50) not null,
);
INSERT INTO Posts VALUES ('Title1');
INSERT INTO Posts VALUES ('Title2');
INSERT INTO Tag VALUES ('Tag number one');
INSERT INTO Tag VALUES ('Tag number two');
INSERT INTO Tag VALUES ('Tag number three');
INSERT INTO Tag VALUES ('Tag number four');
INSERT INTO Tagmap VALUES (0, 0, 1, 2, 3);
My question: is my approach totally off? Should I change the structure or is it good?
If so how can it be better and how can I retrieve all these "key | value" columns along with my posts?
First, you should fix your data structure, so you have one row in tagMap per post_id and tag_id -- not four!
But event with your current structure, I imagine that not all posts have four tags. So, with your current data model you should be using LEFT JOIN, rather than INNER JOIN.
Related
I am new to SQL and I have three table
Templates Table
CREATE TABLE templates (
template_id serial PRIMARY KEY,
template_name VARCHAR ( 15 ) UNIQUE NOT NULL,
FOREIGN KEY (developer_id) REFERENCES users(user_id),
FOREIGN KEY (category_id) REFERENCES categories(category_id),
tag_ids int[],
FOREIGN KEY (EACH ELEMENT OF tag_ids) REFERENCES tags(tag_id)
);
Categories Table
CREATE TABLE categories (
category_id serial PRIMARY KEY,
category_name VARCHAR ( 15 ) UNIQUE NOT NULL
);
Tags Table
CREATE TABLE tags (
tag_id serial PRIMARY KEY,
tag_name VARCHAR ( 100 ) NOT NULL,
);
I want to Select all templates where each template has a category object and a tags object.
Each template has one category but may have multiple tags.
I want to have the tags as an array attribute in the template object
I have tried this query, it does what i want but it creates multiple objects for the same template. So it simply creates n objects where n is the number of tags.
let query = `SELECT t.*, to_json(c) "category", ${developerJson} "developer", json_agg(tgs) "tags" FROM templates t INNER JOIN categories c ON t.category_id = c.category_id INNER JOIN users d ON t.developer_id = d.user_id JOIN tags tgs ON tgs.tag_id = ANY(t.tags_id) ${condition} ${groupBy}`;
Can anyone help me?
I have found the solution. I was passing the tag_id in the group elements.
Once I removed it, I got what I was expecting
const developerJson = `json_build_object( 'first_name',first_name, 'last_name', last_name, 'avatar_link', avatar_link, 'slug', d.slug ,'date_joined',date_joined)`;
const groupBy = `GROUP BY t.template_id, c.*, d.first_name, d.last_name, d.avatar_link, d.slug, d.date_joined`;
let query = `SELECT t.*, to_json(c) "category", ${developerJson} "developer", json_agg(tgs) "tags" FROM templates t INNER JOIN categories c ON t.category_id = c.category_id JOIN users d ON t.developer_id = d.user_id JOIN tags tgs ON tgs.tag_id = ANY(t.tags_id) ${groupBy}`;
I got homework to get average tags of user in album (user_id = x) in the folowing tabels:
>>> CREATE TABLE USERS (ID INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
NAME TEXT NOT NULL);
>>> CREATE TABLE ALBUMS (ID INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
NAME TEXT NOT NULL, CREATION_DATE TEXT NOT NULL,
USER_ID INTEGER REFERENCES USERS(USER_ID) NOT NULL);
>>> CREATE TABLE PICTURES (ID INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
NAME TEXT NOT NULL,
LOCATION TEXT NOT NULL,
CREATION_DATE TEXT NOT NULL,
ALBUM_ID INTEGER REFERENCES ALBUMS(ALBUM_ID) NOT NULL);
>>> CREATE TABLE TAGS (ID INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
PICTURE_ID INTEGER REFERENCES PICTURES(PICTURE_ID) NOT NULL,
USER_ID INTEGER REFERENCES USERS(USER_ID) NOT NULL);";
explenetion:
Each tag is a row in TAGS and it has picture_id, each picture has album_id and each album has user_id, basically i need to count how many times the user is tagged in each album and find the average times that the user is tagged in an album.
I can use this using only: SELECT ? FROM, AVG(), COUNT(), JOIN (INNER, LEFT, RIGHT, FULL JOIN), ON, IN, AND, OR, LIKE, , NOT, (=, != , >, <), IS, DISTINCT, ORDER BY(ASC/DESC), LIMT, OFFSET, and WHERE that means i cannot use GROUP BY
i tried this
SELECT * FROM TAGS INNER JOIN PICTURES ON tags.picture_id = PICTURES.Id where album_id IN (select id from ALBUMS where user_id = x) AND user_id = x;
but it only gives my a table that has all the tags of the user
How can i get the avg tags per album of (user_id = x), is this even possible?
First count how many times the user is tagged in each album and then get the average of these counters:
select
avg(counter) averagetags
from (
select count(t.user_id) counter
from albums a
inner join pictures p on p.album_id = a.id
inner join tags t on t.picture_id = p.id
where t.user_id = ?
group by a.id
)
I am trying to join reviews and likes onto products, but it seems, for some reason that the output of "reviews" column is duplicated by the length of another foreign table, likes, the output length of "reviews" is
amount of likes * amount of reviews
I have no idea why this is happening
My desired output is that the "reviews" column contains an array of JSON data such that one array is equal to one row of a related review
Products
Title Image
----------------------
Photo photo.jpg
Book book.jpg
Table table.jpg
Users
Username
--------
Admin
John
Jane
Product Likes
product_id user_id
---------------------
1 1
1 2
2 1
2 3
Product Reviews
product_id user_id review
-------------------------------------
1 1 Great Product!
1 2 Looks Great
2 1 Could be better
This is the query
SELECT "products".*,
array_to_json(array_agg("product_review".*)) as reviews,
EXISTS(SELECT * FROM product_like lk
JOIN users u ON u.id = "lk"."user_id" WHERE u.id = 4
AND "lk"."product_id" = products.id) AS liked,
COUNT("product_like"."product_id") AS totalLikes from "products"
LEFT JOIN "product_review" on "product_review"."product_id" = "products"."id"
LEFT JOIN "product_like" on "product_like"."product_id" = "products"."id"
group by "products"."id"
Query to create schema and insert data
CREATE TABLE products
(id SERIAL, title varchar(50), image varchar(50), PRIMARY KEY(id))
;
CREATE TABLE users
(id SERIAL, username varchar(50), PRIMARY KEY(id))
;
INSERT INTO products
(title,image)
VALUES
('Photo', 'photo.jpg'),
('Book', 'book.jpg'),
('Table', 'table.jpg')
;
INSERT INTO users
(username)
VALUES
('Admin'),
('John'),
('Jane')
;
CREATE TABLE product_review
(id SERIAL, product_id int NOT NULL, user_id int NOT NULL, review varchar(50), PRIMARY KEY(id), FOREIGN KEY (product_id) references products, FOREIGN KEY (user_id) references users)
;
INSERT INTO product_review
(product_id, user_id, review)
VALUES
(1, 1, 'Great Product!'),
(1, 2, 'Looks Great'),
(2, 1, 'Could be better')
;
CREATE TABLE product_like
(id SERIAL, product_id int NOT NULL, user_id int NOT NULL, PRIMARY KEY(id), FOREIGN KEY (product_id) references products, FOREIGN KEY (user_id) references users)
;
INSERT INTO product_like
(product_id, user_id)
VALUES
(1, 1),
(1, 2),
(2, 1),
(2, 3)
fiddle with the schema and query:
http://sqlfiddle.com/#!15/dff2c/1
Thanks in advance
The reason you are getting multiple results is because of the one-to-many relationships between product_id and product_review and product_like causing duplication of rows prior to aggregation. To work around that, you need to perform the aggregation of those tables in subqueries and join the derived tables instead:
SELECT "products".*,
"pr"."reviews",
EXISTS(SELECT * FROM product_like lk
JOIN users u ON u.id = "lk"."user_id" WHERE u.id = 4
AND "lk"."product_id" = products.id) AS liked,
COALESCE("pl"."totalLikes", 0) AS totalLikes
FROM "products"
LEFT JOIN (SELECT product_id, array_to_json(array_agg("product_review".*)) AS reviews
FROM "product_review"
GROUP BY product_id) "pr" on "pr"."product_id" = "products"."id"
LEFT JOIN (SELECT product_id, COUNT(*) AS "totalLikes"
FROM "product_like"
GROUP BY product_id) "pl" on "pl"."product_id" = "products"."id"
Output:
id title image reviews liked totallikes
1 Photo photo.jpg [{"id":1,"product_id":1,"user_id":1,"review":"Great Product!"},{"id":2,"product_id":1,"user_id":2,"review":"Looks Great"}] f 2
2 Book book.jpg [{"id":3,"product_id":2,"user_id":1,"review":"Could be better"}] f 2
3 Table table.jpg f 0
Demo on dbfiddle
For simplicity, I will write the minimum number of fields in the tables.
Suppose I have this tables: items, item_photos, items_characteristics.
create table items (
id bigserial primary key,
title jsonb not null,
);
create table item_photos (
id bigserial primary key,
path varchar(1000) not null,
item_id bigint references items (id) not null,
sort_order smallint not null,
unique (path, item_id)
);
create table items_characteristics (
item_id bigint references items (id),
characteristic_id bigint references characteristics (id),
characteristic_option_id bigint references characteristic_options (id),
numeric_value numeric(19, 2),
primary key (item_id, characteristic_id),
unique (item_id, characteristic_id, characteristic_option_id));
And I want to aggregate all the photos and characteristics of one item.
For a start, I got this.
select i.id as id,
i.title as title,
array_agg( ip.path) as photos,
array_agg(
array [ico.characteristic_id, ico.characteristic_option_id, ico.numeric_value]) as characteristics_array
FROM items i
LEFT JOIN item_photos ip on i.id = ip.item_id
LEFT JOIN items_characteristics ico on ico.item_id = i.id
GROUP BY i.id
The first problem here arises in the fact that if there are 4 entries in item_characteristics that relate to one item, and, for example, item_photos did not have entries, I get an array of four null elements in the photos field {null, null, null, null}.
So I had to use array_remove:
array_remove(array_agg(ip.path), null) as photos
Further, if I have 1 photo and 4 characteristics, I get a duplicate of 4 photo entries, for example: {img/test-img-1.png,img/test-img-1.png,img/test-img-1.png,img/test-img-1.png}
So I had to use distinct:
array_remove(array_agg(distinct ip.path), null) as photos,
array_agg(distinct
array [ico.characteristic_id, ico.characteristic_option_id, ico.numeric_value]) as characteristics_array
The decision is rather awkward as for me.
The situation is complicated by the fact that I had to add 2 more fields to item_characteristics:
string_value jsonb, --string value
json_value jsonb --custom value
And so I need to aggregate already 5 values from item_characteristics, where 2 are already jsonb and distinct can have a very negative impact on performance.
Is there any more elegant solution?
Aggregate before joining:
SELECT i.id as id, i.title as title, ip.paths, null as photos,
ico.characteristics_array
FROM items i LEFT JOIN
(SELECT ip.item_id, array_agg( ip.path) as paths
FROM item_photos ip
GROUP BY ip.item_ID
) ip
ON ip.id = i.item_id LEFT JOIN
(SELECT ico.item_id,
array_agg(array [ico.characteristic_id, ico.characteristic_option_id, ico.numeric_value]
) as characteristics_array
FROM items_characteristics ico
GROUP BY ico.item_id
) ico
ON ico.item_id = i.id
For starters, a little diagram relations-entities
Diagram relations-entities http://img11.hostingpics.net/pics/32979039DB.png
And now, a dataset
Archive
create :
CREATE TABLE archive (
id integer NOT NULL,
parent_id integer,
code character varying(15) NOT NULL,
label text NOT NULL
);
ALTER TABLE ONLY archive ADD CONSTRAINT archive_pkey PRIMARY KEY (id);
CREATE INDEX idx_142 ON archive USING btree (parent_id);
CREATE UNIQUE INDEX uniq_14242 ON archive USING btree (code);
ALTER TABLE ONLY archive ADD CONSTRAINT fk_14242 FOREIGN KEY (parent_id) REFERENCES archive(id);
insert :
INSERT INTO archive VALUES (1, NULL, 'B28', 'Confidential');
INSERT INTO archive VALUES (2, 1, 'B28.0', 'Nuclear zone');
Keyword
create :
CREATE TABLE keyword (
id integer NOT NULL,
label text NOT NULL,
label_double_metaphone text NOT NULL
);
ALTER TABLE ONLY keyword ADD CONSTRAINT eyword_pkey PRIMARY KEY (id);
CREATE UNIQUE INDEX uniq_242 ON keyword USING btree (label);
insert :
INSERT INTO keyword VALUES (1, 'SECURITY', 'SKRT');
INSERT INTO keyword VALUES (2, 'AREA', 'AR');
INSERT INTO keyword VALUES (3, 'NUCLEAR', 'NKLR');
Assoc_kw_archive
create :
CREATE TABLE assoc_kw_archive (
id integer NOT NULL,
keyword_id integer,
archive_id integer,
weight integer NOT NULL
);
ALTER TABLE ONLY assoc_kw_archive ADD CONSTRAINT assoc_kw_archive_pkey PRIMARY KEY (id);
CREATE INDEX idx_3421 ON assoc_kw_archive USING btree (archive_id);
CREATE INDEX idx_3422 ON assoc_kw_archive USING btree (keyword_id);
ALTER TABLE ONLY assoc_kw_archive ADD CONSTRAINT fk_3421 FOREIGN KEY (archive_id) REFERENCES archive(id);
ALTER TABLE ONLY assoc_kw_archive ADD CONSTRAINT fk_3422 FOREIGN KEY (keyword_id) REFERENCES keyword(id);
insert :
INSERT INTO assoc_kw_archive VALUES (1, 1, 1, 10);
INSERT INTO assoc_kw_archive VALUES (2, 1, 2, 20);
INSERT INTO assoc_kw_archive VALUES (3, 2, 2, 30);
INSERT INTO assoc_kw_archive VALUES (4, 3, 2, 30);
The target
The goal here is to search in the database. The research is based on a string typed by a user. Output a list of archives sorted by relevance. Relevant archive depends on three factors:
The people can make a mistake in the spelling of a word, etc...
The weight of a word to give it importance
Give a gain to the archives include the x keywords typed by the user
I worked on different versions of sql query, but, now I can't to step back and look at the overall problem.
The archive table is composed of 100,000 tuples, 80 000 for the table of keywords and 1,000,000 associations between these two entities.
This is my last version, she is functional, but is very slowly :
select f.id, f.code, f.label, min(f.dist) as distF, max(f.poid) as poidF
from
(
select
a.id,
a.code,
a.label,
( ( levenshtein(lower('Security'), lower(k1.label)) + 1 ) + ( levenshtein(lower('Nuclear'), lower(k2.label)) + 1 ) ) as dist,
( ka1.weight + ka2.weight ) as poid
from archive a
inner join assoc_kw_archive ka1
on ka1.archive_id = a.id
inner join keyword k1
on k1.id = ka1.keyword_id
inner join assoc_kw_archive ka2
on ka2.archive_id = a.id
inner join keyword k2
on k2.id = ka2.keyword_id
where levenshtein(dmetaphone('Security'), k1.label_double_metaphone) < 2
and levenshtein(dmetaphone('Nuclear'), k2.label_double_metaphone) < 2
) as f
group by f.id, f.code, f.label
order by distF asc, poidF desc
limit 10;
I made one join by keyword, it's this that makes it slow! But I can't find another solution.
I think the problem is doing the full join with the distance calculation. here is an alternative approach. Filter the keywords first. Keep the information in the where clause by using a subquery. Then use conditional aggregation to get the information you want.
The query ends up looking something like:
select a.id, a.code, a.label,
min( (levenshtein(lower('Security'), lower(case when securityl < 2 then k.label end)) + 1 ) +
(levenshtein(lower('Nuclear'), lower(case when nuclearl < 2 then k.label end)) + 1 )
) as mindist,
sum(weight) as poid
from archive a inner join
assoc_kw_archive ka
on ka.archive_id = a.id inner join
(select k.*, levenshtein(dmetaphone('Security'), k.label_double_metaphone) as securityl,
levenshtein(dmetaphone('Nuclear'), k.label_double_metaphone) as nuclearl
from keyword k
having securityl < 2 or
nuclearl < 2
) k
on k.id = ka.keyword_id
group by a.id, a.code, a.label