PostgreSQL - key-value pair normalization - sql

I'm trying to design a schema (on Postgres but any other SQL's fine) that supports the following requirements:
Each document (a row in table documents) has a unique string ID (id field) and several other data fields.
Each document can have 0 or more tags (which is string key-value pairs) attached to it, and, the goal is to build a system that lets users to sort or filter documents using those string key-value pairs. E.g. "Show me all documents that have a tag of "key1" with "value1" value AND sort the output using the tag value of "key3".
So DDL should look like this: (simplified)
create table documents
(
id char(32) not null
constraint documents_pkey
primary key,
data varchar(2000),
created_at timestamp,
updated_at timestamp
)
create table document_tags
(
id serial not null
constraint document_tags_pkey
primary key,
document_id char(32) not null
constraint document_tags_documents_id_fk
references documents
on update cascade on delete cascade,
tag_key varchar(200) not null,
tag_value varchar(2000) not null
)
Now my question is how can I build a query that does filtering/sorting using the tag key values? E.g. Returns all documents (possibly with LIMIT/OFFSET) that does have "key1" = "value1" tag and "key2" = "value2" tags, sorted by the value of "key3" tag.

You can use group by and having:
select dt.document_id
from document_tags dt
where dt.tag_key = 'key1' and dt.tag_value = 'value1'
group by dt.document_id
order by max(case when dt.tag_key = 'key2' then dt.tag_value end);

Related

How to efficiently insert ENUM value into table?

Consider the following schema:
CREATE TABLE IF NOT EXISTS snippet_types (
id INTEGER NOT NULL PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
CREATE TABLE IF NOT EXISTS snippets (
id INTEGER NOT NULL PRIMARY KEY,
title TEXT,
content TEXT,
type INTEGER NOT NULL,
FOREIGN KEY(type) REFERENCES snippet_types(id)
);
This schema assumes a one-to-many relationship between tables and allows efficiently maintaining a set of ENUMs in the snippet_types table. Efficiency comes from the fact that we don't need to store the whole string describing snippet type in the snippets table, but this decision also leads us to some inconvenience: upon inserting we need to retrieve snippet id from snippet_types and this leads to one more select and check before inserting:
SELECT id FROM snippet_types WHERE name = "foo";
-- ...check that > 0 rows returned...
INSERT INTO snippets (title, content, type) values ("bar", "buz", id);
We could also combine this insert and select into one select like that:
INSERT INTO snippets (title, content, type)
SELECT ("bar", "buz", id) FROM snippet_types WHERE name = "foo"
However, if "foo" type is missing in snippet_types then 0 rows would have been inserted and no error returned and I don't see a possibility to get a number of rows sqlite actually inserted.
How can I insert ENUM-containing tuple in one query?

Update value inside json matching a criteria

I have entries (json datatype) in my database with some typos. I want to update them with a correct one
I'm using a PostgreSQL 9.4
This is what I'm trying:
UPDATE table
SET infos = infos || '{ "key1": { "key2": "new text with no typo" } }'
WHERE contact_id = (SELECT contact_id FROM table WHERE infos::TEXT LIKE '%criteria%');
DDL:
CREATE TABLE public.timeline (
contact_id serial NOT NULL,
infos json NULL,
CONSTRAINT contacts_pkey PRIMARY KEY (contact_id)
);
I expect to automatically update all the lines containing the typo.
Solution:
UPDATE table
SET infos = jsonb_set(to_jsonb(infos), '{key1,key2}', '{"key2": "new value to replace"}', false)
WHERE contact_id IN (SELECT contact_id FROM table WHERE infos->'key1'->>'key2' LIKE '%word_to_match%');

Query optimization: connecting meta data to a value list table

I have a database containing a table with data and a meta data table. I want to create a View that selects certain meta data belonging to an item and list it as a column.
The basic query for the view is: SELECT * FROM item. The item table is defined as:
CREATE TABLE item (
id INTEGER PRIMARY KEY AUTOINCREMENT
UNIQUE
NOT NULL,
traceid INTEGER REFERENCES trace (id)
NOT NULL,
freq BIGINT NOT NULL,
value REAL NOT NULL
);
The meta data to be added follow the schema "metadata.parameter='name'"
The meta table is defined as:
CREATE TABLE metadata (
id INTEGER PRIMARY KEY AUTOINCREMENT
UNIQUE
NOT NULL,
parameter STRING NOT NULL
COLLATE NOCASE,
value STRING NOT NULL
COLLATE NOCASE,
datasetid INTEGER NOT NULL
REFERENCES dataset (id),
traceid INTEGER REFERENCES trace (id),
itemid INTEGER REFERENCES item (id)
);
The "name" parameter should be selected this way:
if a record exists where parameter is "name" and itemid matches item.id, then its value should be included in the record.
otherwise, if a record exists where parameter is "name", "itemid" is NULL, and traceid matches item.traceid, its value should be used
otherwise, the result should be NULL, but the record from the item table should be included anyway
Currently, I use the following query to achieve this goal:
SELECT i.*,
COALESCE (
MAX(CASE WHEN m.parameter='name' THEN m.value END),
MAX(CASE WHEN m2.parameter='name' THEN m2.value END)
) AS itemname
FROM item i
JOIN metadata m
ON (m.itemid = i.id AND m.parameter='name')
JOIN metadata m2
ON (m2.itemid IS NULL AND m2.traceid = i.traceid AND m2.parameter='name')
GROUP BY i.id
This query however is somewhat inefficient, as the metadata table is used twice and contains many more records than just the "name" ones. So I am looking for a way to improve speed, especially regarding the case that some extensions are about to be implemented:
there is a third level "dataset" that should be included: a "parameter=name" should be used if it has the same datasetid as the item (will be looked up for the items by searching another which connects traceid and datasetid), if no "parameter=name" exists with either "itemid" matching or "traceid" matching
more meta data should be queried by the view following the same schema
Any help is appreciated.
First of all, you can use one join instead of 2, like this:
JOIN metadata m ON (m.parameter='name' AND (m.itemId = i.id OR (m.itemId IS NULL AND m.traceid = i.traceid)))
Then you can remove COALESCE, using simple select:
SELECT i.*, m.value as itemname
Result should look like this:
SELECT i.*, m.value as itemname
FROM item i
JOIN metadata m ON (m.parameter='name' AND (m.itemId = i.id OR (m.itemId IS NULL AND m.traceid = i.traceid)))
GROUP BY i.id

How can I execute an `INSERT INTO` on a table with a string primary key?

I have two tables that are initialized with something like this:
create table foo (
"id" varchar(254) not null primary key,
"first_name" varchar(254) not null);
create table my_user (
"id" serial not null primary key,
"role" varchar(254) not null,
"first_name" varchar(254) not null);
The reason why the id column of foo is a varchar(254) instead of a serial is because in normal operations I'm inserting in an id provided by Google OAuth2 instead of generating my own id values.
I now have a set of records in a third table I call temp with the first_name column. I'm trying to emulate this post, but I'm not sure how to do so for string primary keys.
select * from (insert into my_user(id, role)
('some id value I want to generate, like historical || incrementing number',
[a fixed number],
select first_name from temp) returning id);
As it says in the official Postgres documentation, I know I need to get the arguments following the insert statement into the format of a table that matches the declaration of my_user. I guess I'm just lost as to how to generate a column of the ids I want here, or even a column of one number repeating.
Thanks for reading
You could insert a UUID (it's like a GUID) in your ID... It's guaranteed to be unique.
Sadly it's a little complex to load the module: Generating a UUID in Postgres for Insert statement?
Ah... and what wildplasser said, +1! :-)

Why does this query only select a single row?

SELECT * FROM tbl_houses
WHERE
(SELECT HousesList
FROM tbl_lists
WHERE tbl_lists.ID = '123') LIKE CONCAT('% ', tbl_houses.ID, '#')
It only selects the row from tbl_houses of the last occuring tbl_houses.ID inside tbl_lists.HousesList
I need it to select all the rows where any ID from tbl_houses exists within tbl_lists.HousesList
It's hard to tell without knowing exactly what your data looks like, but if it only matches the last ID, it's probably because you don't have any % at the end of the string, so as to allow for the list to continue after the match.
Is that a database in zeroth normal form I smell?
If you have attributes containing lists of values, like that HousesList attribute, you should instead be storing those as distinct values in a separate relation.
CREATE TABLE house (
id VARCHAR NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE list (
id VARCHAR NOT NULL,
PRIMARY KEY (id),
);
CREATE TABLE listitem (
list_id VARCHAR NOT NULL,
FOREIGN KEY list_id REFERENCES list (id),
house_id VARCHAR NOT NULL,
FOREIGN KEY house_id REFERENCES house (id),
PRIMARY KEY (list_id, house_id)
);
Then your distinct house listing values each have their own tuple, and can be selected like any other.
SELECT house.*
FROM house
JOIN listitem
ON listitem.house_id = house.id
WHERE
listitem.list_id = '123'