Simple tag searching with Sphinx - sql

For example, I have 3 tables:
documents (
id serial PRIMARY KEY,
title character varying(256),
content text,
created timestamp with time zone
);
tags (
id serial PRIMARY KEY,
tag_content character varying(128)
);
tag_assoc (
id serial PRIMARY KEY,
document_id integer,
tag_id integer
);
I would like to be able to search documents for title, content, and for tags.
My sql_query so far is very simple like:
sql_query SELECT id, title, content FROM documents
How would I set up the Sphinx sql_query so that the tags associated with each document are joined to them?

You could use inside sql_query a subselect with group_concat to retrieve them , but a better approach would be to use the sql_joined_field. In your case, would look like:
sql_joined_field = tags from query; tag.assoc.document_id, \
tag_content from tags join tag_assoc on \
tags.id=tar_assoc.tag_id order by tag.assoc.document_id asc

Related

Postgresql Upsert based on conditition

I have the following tables
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4 (),
...
CREATE TABLE tags (
id UUID PRIMARY KEY DEFAULT uuid_generate_v4 (),
user_id UUID NOT NULL references users (id),
tag VARCHAR(200) NOT NULL,
...
}
I would like to form a query that inserts a tag based on the following constraints:
For a given user_id in the tags table, all entries must have unique tags
Different user_ids can have the same tag. For example:
The following should be valid in the tag table
id
user_id
tag
some-tag-uuid-1
some-user-uuid-1
foo
some-tag-uuid-2
some-user-uuid-1
bar
some-tag-uuid-3
some-user-uuid-2
foo
Note the differences in user_id .
The following should NOT be valid in the tag table
id
user_id
tag
some-tag-uuid-1
some-user-uuid-1
foo
some-tag-uuid-2
some-user-uuid-1
foo
If an entry exists, I should return the existing tag id. If not, we insert the new tag
and return the new tag's id.
What I currently have
As of now, the only query I can come up with is split into two parts and the app handles the intermediate logic.
For a given tag to insert e.g.
{id: 'some-tag-uuid-1', user_id: 'some-user-uuid-1', tag: 'busy'};
SELECT id FROM tag WHERE user_id = 'some-user-uuid-1' AND tag = 'busy'
From the resulting rows, I then check if it exists, if so, I return the existing id, if not I insert the new id in the tag table returning the new id.
I'm not sure if this approach is the best approach, and would like a single more performant query (if possible)
As stated by #SebDieBln :
You add a unique constraint in the tags table definition : CONSTRAINT unique_constraint UNIQUE (user_id, tag)
You add ON CONFLICT DO NOTHING in the INSERT statement
You add the RETURNING clause in the INSERT statement in order to get the new tag when inserted
But when the tag value already exists for the user_id, the returned value is NULL, so you need to catch the tag input value instead.
Finaly you can do everything within a sql function :
CREATE OR REPLACE FUNCTION test (IN _user_id UUID , INOUT _tag VARCHAR(200), OUT _rank INTEGER)
RETURNS record LANGUAGE sql AS
$$
WITH cte AS (INSERT INTO tags (user_id, tag) VALUES (_user_id, _tag) ON CONFLICT DO NOTHING RETURNING tag)
SELECT tag, 1 FROM cte
UNION
SELECT _tag, 2
ORDER BY 2
LIMIT 1 ;
$$
And you call the sql function to get the expected behavior :
SELECT _tag FROM test('some-user-uuid-1', 'busy')
see the test result in dbfiddle.

Use few analyzers in GIN index in Postgres

I want to create GIN index for Postges full text search and I would like to ask is it possible if I store analyzer name for each row in table in separate column called lang, use it to create GIN index with different analyzer for each row taken from this field lang?
This is what I use now. Analyzer – ‘english’ and it is common for each row in indexed table.
CREATE INDEX CONCURRENTLY IF NOT EXISTS
decription_fts_gin_idx ON api_product
USING GIN(to_tsvector('english', description))
I want to do something like this:
CREATE INDEX CONCURRENTLY IF NOT EXISTS
decription_fts_gin_idx ON api_product
USING GIN(to_tsvector(api_product.lang, description))
( it doesnt work)
in order to retrieve analyzer configuration from field lang and use its name to populate index.
Is it possible to do it somehow or it is only possible to use one analyzer for the whole index?
DDL, just in case..
-- auto-generated definition
create table api_product
(
id serial not null
constraint api_product_pkey
primary key,
name varchar(100) not null,
standard varchar(40) not null,
weight integer not null
constraint api_product_weight_check
check (weight >= 0),
dimensions varchar(30) not null,
description text not null,
textsearchable_index_col tsvector,
department varchar(30) not null,
lang varchar(25) not null
);
alter table api_product
owner to postgres;
create index textsearch_idx
on api_product (textsearchable_index_col);
Query to run for seach:
SELECT *,
ts_rank_cd(to_tsvector('english', description),
to_tsquery('english', %(keyword)s), 32) as rnk
FROM api_product
WHERE to_tsvector('english', description) ## to_tsquery('english', %(keyword)s)
ORDER BY rnk DESC, id
where 'english' would be changed to 'lang' field analyzer name (english, french, etc)
If you know ahead of time the language you are querying against, you could create a series of partial indexes:
CREATE INDEX CONCURRENTLY ON api_product
USING GIN(to_tsvector('english', description)) where lang='english';
Then in your query you would add the language you are searching in:
SELECT *,
ts_rank_cd(to_tsvector('english', description),
to_tsquery('english', %(keyword)s), 32) as rnk
FROM api_product
WHERE to_tsvector('english', description) ## to_tsquery('english', %(keyword)s)
and lang='english'
ORDER BY rnk DESC, id
What you asked about is definitely possible, but you have the wrong type for the lang column:
create table api_product(description text, lang regconfig);
create index on api_product using gin (to_tsvector(lang, description));
insert into api_product VALUES ('the description', 'english');

How to make human readable autoincrement column in PostgreSQL?

I need to make the column for store serial number of orders in the online shop.
Currently, I have this one
CREATE TABLE public.orders
(
id SERIAL PRIMARY KEY NOT NULL,
title VARCHAR(100) NOT NULL
);
CREATE UNIQUE INDEX orders_id_uindex ON public.orders (id);
But I need to create the special alphanumeric format for storing this number
like this 5CC806CF751A2.
How can I create this format with Postgres capabilities?
You can create a view that simply converts the ID to a hex value:
create view readable_orders
as
select id,
to_hex(id) as readable_id,
title
from orders;

PostgreSQL - key-value pair normalization

I'm trying to design a schema (on Postgres but any other SQL's fine) that supports the following requirements:
Each document (a row in table documents) has a unique string ID (id field) and several other data fields.
Each document can have 0 or more tags (which is string key-value pairs) attached to it, and, the goal is to build a system that lets users to sort or filter documents using those string key-value pairs. E.g. "Show me all documents that have a tag of "key1" with "value1" value AND sort the output using the tag value of "key3".
So DDL should look like this: (simplified)
create table documents
(
id char(32) not null
constraint documents_pkey
primary key,
data varchar(2000),
created_at timestamp,
updated_at timestamp
)
create table document_tags
(
id serial not null
constraint document_tags_pkey
primary key,
document_id char(32) not null
constraint document_tags_documents_id_fk
references documents
on update cascade on delete cascade,
tag_key varchar(200) not null,
tag_value varchar(2000) not null
)
Now my question is how can I build a query that does filtering/sorting using the tag key values? E.g. Returns all documents (possibly with LIMIT/OFFSET) that does have "key1" = "value1" tag and "key2" = "value2" tags, sorted by the value of "key3" tag.
You can use group by and having:
select dt.document_id
from document_tags dt
where dt.tag_key = 'key1' and dt.tag_value = 'value1'
group by dt.document_id
order by max(case when dt.tag_key = 'key2' then dt.tag_value end);

Why does this query only select a single row?

SELECT * FROM tbl_houses
WHERE
(SELECT HousesList
FROM tbl_lists
WHERE tbl_lists.ID = '123') LIKE CONCAT('% ', tbl_houses.ID, '#')
It only selects the row from tbl_houses of the last occuring tbl_houses.ID inside tbl_lists.HousesList
I need it to select all the rows where any ID from tbl_houses exists within tbl_lists.HousesList
It's hard to tell without knowing exactly what your data looks like, but if it only matches the last ID, it's probably because you don't have any % at the end of the string, so as to allow for the list to continue after the match.
Is that a database in zeroth normal form I smell?
If you have attributes containing lists of values, like that HousesList attribute, you should instead be storing those as distinct values in a separate relation.
CREATE TABLE house (
id VARCHAR NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE list (
id VARCHAR NOT NULL,
PRIMARY KEY (id),
);
CREATE TABLE listitem (
list_id VARCHAR NOT NULL,
FOREIGN KEY list_id REFERENCES list (id),
house_id VARCHAR NOT NULL,
FOREIGN KEY house_id REFERENCES house (id),
PRIMARY KEY (list_id, house_id)
);
Then your distinct house listing values each have their own tuple, and can be selected like any other.
SELECT house.*
FROM house
JOIN listitem
ON listitem.house_id = house.id
WHERE
listitem.list_id = '123'