Query optimization: connecting meta data to a value list table - sql

I have a database containing a table with data and a meta data table. I want to create a View that selects certain meta data belonging to an item and list it as a column.
The basic query for the view is: SELECT * FROM item. The item table is defined as:
CREATE TABLE item (
id INTEGER PRIMARY KEY AUTOINCREMENT
UNIQUE
NOT NULL,
traceid INTEGER REFERENCES trace (id)
NOT NULL,
freq BIGINT NOT NULL,
value REAL NOT NULL
);
The meta data to be added follow the schema "metadata.parameter='name'"
The meta table is defined as:
CREATE TABLE metadata (
id INTEGER PRIMARY KEY AUTOINCREMENT
UNIQUE
NOT NULL,
parameter STRING NOT NULL
COLLATE NOCASE,
value STRING NOT NULL
COLLATE NOCASE,
datasetid INTEGER NOT NULL
REFERENCES dataset (id),
traceid INTEGER REFERENCES trace (id),
itemid INTEGER REFERENCES item (id)
);
The "name" parameter should be selected this way:
if a record exists where parameter is "name" and itemid matches item.id, then its value should be included in the record.
otherwise, if a record exists where parameter is "name", "itemid" is NULL, and traceid matches item.traceid, its value should be used
otherwise, the result should be NULL, but the record from the item table should be included anyway
Currently, I use the following query to achieve this goal:
SELECT i.*,
COALESCE (
MAX(CASE WHEN m.parameter='name' THEN m.value END),
MAX(CASE WHEN m2.parameter='name' THEN m2.value END)
) AS itemname
FROM item i
JOIN metadata m
ON (m.itemid = i.id AND m.parameter='name')
JOIN metadata m2
ON (m2.itemid IS NULL AND m2.traceid = i.traceid AND m2.parameter='name')
GROUP BY i.id
This query however is somewhat inefficient, as the metadata table is used twice and contains many more records than just the "name" ones. So I am looking for a way to improve speed, especially regarding the case that some extensions are about to be implemented:
there is a third level "dataset" that should be included: a "parameter=name" should be used if it has the same datasetid as the item (will be looked up for the items by searching another which connects traceid and datasetid), if no "parameter=name" exists with either "itemid" matching or "traceid" matching
more meta data should be queried by the view following the same schema
Any help is appreciated.

First of all, you can use one join instead of 2, like this:
JOIN metadata m ON (m.parameter='name' AND (m.itemId = i.id OR (m.itemId IS NULL AND m.traceid = i.traceid)))
Then you can remove COALESCE, using simple select:
SELECT i.*, m.value as itemname
Result should look like this:
SELECT i.*, m.value as itemname
FROM item i
JOIN metadata m ON (m.parameter='name' AND (m.itemId = i.id OR (m.itemId IS NULL AND m.traceid = i.traceid)))
GROUP BY i.id

Related

Postgres query all results from one table blended with conditional data from another table

I have 2 SQL tables and I'm trying to generate a new table with data from the 2 tables.
Jobs table:
jobs (
id SERIAL PRIMARY KEY,
position TEXT NOT NULL,
location TEXT NOT NULL,
pay NUMERIC NOT NULL,
duration TEXT NOT NULL,
description TEXT NOT NULL,
term TEXT NOT NULL,
user_id INTEGER REFERENCES users(id) ON DELETE SET NULL
)
Applied table:
applied (
id SERIAL PRIMARY KEY,
completed BOOLEAN DEFAULT FALSE,
user_id INTEGER REFERENCES users(id) ON DELETE SET NULL,
job_id INTEGER REFERENCES jobs(id) ON DELETE SET NULL,
UNIQUE (user_id, job_id)
);
The tabled populated with data look like this:
Jobs table
Applied table
I want my final query to be a table that matches the jobs table but that has a new column called js_id with true or false based on whether the user has applied to that job. I want the table to look like this:
Here is the query I came up with to generate the above table:
SELECT DISTINCT on (jobs.id)
jobs.*, applied.user_id as applicant,
CASE WHEN applied.user_id = 1 THEN TRUE
ELSE FALSE END as js_id
FROM jobs
JOIN applied on jobs.id = applied.job_id;
However this doesn't work as I add more applicants to the table. I get different true and false values and I haven't been able to get it working. When I remove DISTINCT on (jobs.id) my true values are consistent but I wind up with a lot more than the 3 jobs I want. Here are the results without the DISTINCT on (jobs.id):
I think you want exists:
SELECT j.*,
(exists (select 1
from applied a
where a.job_id = j.id and a.user_id = 1
) as js_id
FROM jobs j;

PostgreSQL - key-value pair normalization

I'm trying to design a schema (on Postgres but any other SQL's fine) that supports the following requirements:
Each document (a row in table documents) has a unique string ID (id field) and several other data fields.
Each document can have 0 or more tags (which is string key-value pairs) attached to it, and, the goal is to build a system that lets users to sort or filter documents using those string key-value pairs. E.g. "Show me all documents that have a tag of "key1" with "value1" value AND sort the output using the tag value of "key3".
So DDL should look like this: (simplified)
create table documents
(
id char(32) not null
constraint documents_pkey
primary key,
data varchar(2000),
created_at timestamp,
updated_at timestamp
)
create table document_tags
(
id serial not null
constraint document_tags_pkey
primary key,
document_id char(32) not null
constraint document_tags_documents_id_fk
references documents
on update cascade on delete cascade,
tag_key varchar(200) not null,
tag_value varchar(2000) not null
)
Now my question is how can I build a query that does filtering/sorting using the tag key values? E.g. Returns all documents (possibly with LIMIT/OFFSET) that does have "key1" = "value1" tag and "key2" = "value2" tags, sorted by the value of "key3" tag.
You can use group by and having:
select dt.document_id
from document_tags dt
where dt.tag_key = 'key1' and dt.tag_value = 'value1'
group by dt.document_id
order by max(case when dt.tag_key = 'key2' then dt.tag_value end);

SQL JOIN To Find Records That Don't Have a Matching Record With a Specific Value

I'm trying to speed up some code that I wrote years ago for my employer's purchase authorization app. Basically I have a SLOW subquery that I'd like to replace with a JOIN (if it's faster).
When the director logs into the application he sees a list of purchase requests he has yet to authorize or deny. That list is generated with the following query:
SELECT * FROM SA_ORDER WHERE ORDER_ID NOT IN
(SELECT ORDER_ID FROM SA_SIGNATURES WHERE TYPE = 'administrative director');
There are only about 900 records in sa_order and 1800 records in sa_signature and this query still takes about 5 seconds to execute. I've tried using a LEFT JOIN to retrieve records I need, but I've only been able to get sa_order records with NO matching records in sa_signature, and I need sa_order records with "no matching records with a type of 'administrative director'". Your help is greatly appreciated!
The schema for the two tables is as follows:
The tables involved have the following layout:
CREATE TABLE sa_order
(
`order_id` BIGINT PRIMARY KEY AUTO_INCREMENT,
`order_number` BIGINT NOT NULL,
`submit_date` DATE NOT NULL,
`vendor_id` BIGINT NOT NULL,
`DENIED` BOOLEAN NOT NULL DEFAULT FALSE,
`MEMO` MEDIUMTEXT,
`year_id` BIGINT NOT NULL,
`advisor` VARCHAR(255) NOT NULL,
`deleted` BOOLEAN NOT NULL DEFAULT FALSE
);
CREATE TABLE sa_signature
(
`signature_id` BIGINT PRIMARY KEY AUTO_INCREMENT,
`order_id` BIGINT NOT NULL,
`signature` VARCHAR(255) NOT NULL,
`proxy` BOOLEAN NOT NULL DEFAULT FALSE,
`timestamp` TIMESTAMP NOT NULL DEFAULT NOW(),
`username` VARCHAR(255) NOT NULL,
`type` VARCHAR(255) NOT NULL
);
Create an index on sa_signatures (type, order_id).
This is not necessary to convert the query into a LEFT JOIN unless sa_signatures allows nulls in order_id. With the index, the NOT IN will perform as well. However, just in case you're curious:
SELECT o.*
FROM sa_order o
LEFT JOIN
sa_signatures s
ON s.order_id = o.order_id
AND s.type = 'administrative director'
WHERE s.type IS NULL
You should pick a NOT NULL column from sa_signatures for the WHERE clause to perform well.
You could replace the [NOT] IN operator with EXISTS for faster performance.
So you'll have:
SELECT * FROM SA_ORDER WHERE NOT EXISTS
(SELECT ORDER_ID FROM SA_SIGNATURES
WHERE TYPE = 'administrative director'
AND ORDER_ID = SA_ORDER.ORDER_ID);
Reason : "When using “NOT IN”, the query performs nested full table scans, whereas for “NOT EXISTS”, query can use an index within the sub-query."
Source : http://decipherinfosys.wordpress.com/2007/01/21/32/
This following query should work, however I suspect your real issue is you don't have the proper indices in place. You should have an index on the SA_SGINATURES table on the ORDER_ID column.
SELECT *
FROM
SA_ORDER
LEFT JOIN
SA_SIGNATURES
ON
SA_ORDER.ORDER_ID = SA_SIGNATURES.ORDER_ID AND
TYPE = 'administrative director'
WHERE
SA_SIGNATURES.ORDER_ID IS NULL;
select * from sa_order as o inner join sa_signature as s on o.orderid = sa.orderid and sa.type = 'administrative director'
also, you can create a non clustered index on type in sa_signature table
even better - have a master table for types with typeid and typename, and then instead of saving type as text in your sa_signature table, simply save type as integer. thats because computing on integers is way faster than computing on text

SQL Concat Query

I have two tables like this:
TABLE user(
id CHAR(100)
text TEXT
)
TABLE post(
postid CHAR(100)
postedby CHAR(100)
text TEXT
FOREIGN KEY (postedby) references user
);
I need a query that for each user concatenates the TEXT column of all posts of that user and put them in the text column of the user. the order is not important.
What should I do?
To select the values use GROUP_CONCAT:
SELECT postedby, GROUP_CONCAT(text)
FROM post
GROUP BY postedby
To update your original table you will need to join this result with your original table using a multi-table update.
UPDATE user
LEFT JOIN
(
SELECT postedby, GROUP_CONCAT(text) AS text
FROM post
GROUP BY postedby
) T1
ON user.id = T1.postedby
SET user.text = IFNULL(T1.text, '');

Why does this query only select a single row?

SELECT * FROM tbl_houses
WHERE
(SELECT HousesList
FROM tbl_lists
WHERE tbl_lists.ID = '123') LIKE CONCAT('% ', tbl_houses.ID, '#')
It only selects the row from tbl_houses of the last occuring tbl_houses.ID inside tbl_lists.HousesList
I need it to select all the rows where any ID from tbl_houses exists within tbl_lists.HousesList
It's hard to tell without knowing exactly what your data looks like, but if it only matches the last ID, it's probably because you don't have any % at the end of the string, so as to allow for the list to continue after the match.
Is that a database in zeroth normal form I smell?
If you have attributes containing lists of values, like that HousesList attribute, you should instead be storing those as distinct values in a separate relation.
CREATE TABLE house (
id VARCHAR NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE list (
id VARCHAR NOT NULL,
PRIMARY KEY (id),
);
CREATE TABLE listitem (
list_id VARCHAR NOT NULL,
FOREIGN KEY list_id REFERENCES list (id),
house_id VARCHAR NOT NULL,
FOREIGN KEY house_id REFERENCES house (id),
PRIMARY KEY (list_id, house_id)
);
Then your distinct house listing values each have their own tuple, and can be selected like any other.
SELECT house.*
FROM house
JOIN listitem
ON listitem.house_id = house.id
WHERE
listitem.list_id = '123'