postgresql retype in index - sql

How can I create index in PostgreSQL like:
CREATE INDEX i_users_user_id
ON users
USING btree (user_id::character varying);
I want Integer column to behave like String column:
SELECT * FROM vw_users WHERE user_id='string'
'string' is some value and I don't know if it is user_id or session_id and I want only one query:)
vw_users is:
SELECT user_id::character varying FROM users
UNION
SELECT session_id as user_id FROM temp_users
Tables are:
CREATE TABLE users (user_id integer)
CREATE TABLE temp_users (session_id character varying)
Regards

An index on an expression requires an extra set of parentheses:
CREATE INDEX i_users_user_id
ON users
USING btree ((user_id::character varying));

Related

Use few analyzers in GIN index in Postgres

I want to create GIN index for Postges full text search and I would like to ask is it possible if I store analyzer name for each row in table in separate column called lang, use it to create GIN index with different analyzer for each row taken from this field lang?
This is what I use now. Analyzer – ‘english’ and it is common for each row in indexed table.
CREATE INDEX CONCURRENTLY IF NOT EXISTS
decription_fts_gin_idx ON api_product
USING GIN(to_tsvector('english', description))
I want to do something like this:
CREATE INDEX CONCURRENTLY IF NOT EXISTS
decription_fts_gin_idx ON api_product
USING GIN(to_tsvector(api_product.lang, description))
( it doesnt work)
in order to retrieve analyzer configuration from field lang and use its name to populate index.
Is it possible to do it somehow or it is only possible to use one analyzer for the whole index?
DDL, just in case..
-- auto-generated definition
create table api_product
(
id serial not null
constraint api_product_pkey
primary key,
name varchar(100) not null,
standard varchar(40) not null,
weight integer not null
constraint api_product_weight_check
check (weight >= 0),
dimensions varchar(30) not null,
description text not null,
textsearchable_index_col tsvector,
department varchar(30) not null,
lang varchar(25) not null
);
alter table api_product
owner to postgres;
create index textsearch_idx
on api_product (textsearchable_index_col);
Query to run for seach:
SELECT *,
ts_rank_cd(to_tsvector('english', description),
to_tsquery('english', %(keyword)s), 32) as rnk
FROM api_product
WHERE to_tsvector('english', description) ## to_tsquery('english', %(keyword)s)
ORDER BY rnk DESC, id
where 'english' would be changed to 'lang' field analyzer name (english, french, etc)
If you know ahead of time the language you are querying against, you could create a series of partial indexes:
CREATE INDEX CONCURRENTLY ON api_product
USING GIN(to_tsvector('english', description)) where lang='english';
Then in your query you would add the language you are searching in:
SELECT *,
ts_rank_cd(to_tsvector('english', description),
to_tsquery('english', %(keyword)s), 32) as rnk
FROM api_product
WHERE to_tsvector('english', description) ## to_tsquery('english', %(keyword)s)
and lang='english'
ORDER BY rnk DESC, id
What you asked about is definitely possible, but you have the wrong type for the lang column:
create table api_product(description text, lang regconfig);
create index on api_product using gin (to_tsvector(lang, description));
insert into api_product VALUES ('the description', 'english');

How to create GIN index with LOWER in PostgreSQL?

First of all - I use JPA ORM (EclipseLink) which doesn't support ILIKE. So I am looking for solution to have case insensitive search. I did the following:
CREATE TABLE IF NOT EXISTS users (
id SERIAL NOT NULL,
name VARCHAR(512) NOT NULL,
PRIMARY KEY (id));
CREATE INDEX users_name_idx ON users USING gin (LOWER(name) gin_trgm_ops);
INSERT INTO users (name) VALUES ('User Full Name');
However, this query returns user:
SELECT * FROM users WHERE name ILIKE '%full%';
But this one doesn't:
SELECT * FROM users WHERE name LIKE '%full%';
So, how to create GIN index with LOWER in PostgreSQL?
I'm not sure I understand the question. because you mention GIN and insert one row and expect it to be returned with case insensitive comparison, but a wild guess - maybe you are looking for citext?..
t=# create extension citext;
CREATE EXTENSION
t=# CREATE TABLE IF NOT EXISTS users (
id SERIAL NOT NULL,
name citext NOT NULL,
PRIMARY KEY (id));
CREATE TABLE
t=# INSERT INTO users (name) VALUES ('User Full Name');
INSERT 0 1
t=# SELECT * FROM users WHERE name LIKE '%full%';
id | name
----+----------------
1 | User Full Name
(1 row)
update
expression based index requires expression in query

postgres fast check if attribute combination also exists in another table

I want to check if the same two attribute values exist in two different tables. If the combination from table_a is not existing in table_b it should be inserted into the select statement table. Right now I have the following query, which is working:
CREATE TABLE table_a (
attr_a integer,
attr_b text,
uuid character varying(200),
CONSTRAINT table_a_pkey PRIMARY KEY (uuid)
);
CREATE TABLE table_b (
attr_a integer,
attr_b text,
uuid character varying(200),
CONSTRAINT table_b_pkey PRIMARY KEY (uuid)
);
SELECT * FROM table_a
WHERE (table_a.attr_a::text || table_a.attr_b::text) != ALL(SELECT (table_b.attr_a::text || table_b.attr_a::text) FROM table_b)
However, the execution time is pretty long. So I would like to ask if there is a faster solution to check for that.
Your where clause uses a manipulation of attr_a (casting it to text and concatinating with attr_b), so the index can't be used. Instead of this concatination, why not try a straight-forward exists operator?
SELECT *
FROM table_a a
WHERE NOT EXISTS (SELECT *
FROM table_b b
WHERE a.attr_a = b.attr_a AND
b.attr_b = b.attr_b)

SQLite performance tuning for paginated fetches

I am trying to optimize the query I use for fetching paginated data from database with large data sets.
My schema looks like this:
CREATE TABLE users (
user_id TEXT PRIMARY KEY,
name TEXT,
custom_fields TEXT
);
CREATE TABLE events (
event_id TEXT PRIMARY KEY,
organizer_id TEXT NOT NULL REFERENCES users(user_id) ON DELETE SET NULL ON UPDATE CASCADE,
name TEXT NOT NULL,
type TEXT NOT NULL,
start_time INTEGER,
duration INTEGER
-- more columns here, omitted for the sake of simplicity
);
CREATE INDEX events_organizer_id_start_time_idx ON events(organizer_id, start_time);
CREATE INDEX events_organizer_id_type_idx ON events(organizer_id, type);
CREATE INDEX events_organizer_id_type_start_time_idx ON events(organizer_id, type, start_time);
CREATE INDEX events_type_start_time_idx ON events(type, start_time);
CREATE INDEX events_start_time_desc_idx ON events(start_time DESC);
CREATE INDEX events_start_time_asc_idx ON events(IFNULL(start_time, 253402300800) ASC);
CREATE TABLE event_participants (
participant_id TEXT NOT NULL REFERENCES users(user_id) ON DELETE CASCADE ON UPDATE CASCADE,
event_id TEXT NOT NULL REFERENCES events(event_id) ON DELETE CASCADE ON UPDATE CASCADE,
role INTEGER NOT NULL DEFAULT 0,
UNIQUE (participant_id, event_id) ON CONFLICT REPLACE
);
CREATE INDEX event_participants_participant_id_event_id_idx ON event_participants(participant_id, event_id);
CREATE INDEX event_participants_event_id_idx ON event_participants(event_id);
CREATE TABLE event_tag_maps (
event_id TEXT NOT NULL REFERENCES events(event_id) ON DELETE CASCADE ON UPDATE CASCADE,
tag_id TEXT NOT NULL,
PRIMARY KEY (event_id, tag_id) ON CONFLICT IGNORE
);
CREATE INDEX event_tag_maps_event_id_tag_id_idx ON event_tag_maps(event_id, tag_id);
Where in events table I have around 1,500,000 entries, and around 2,000,000 in event_participants.
Now, a typical query would look something like:
SELECT
EVTS.event_id,
EVTS.type,
EVTS.name,
EVTS.time,
EVTS.duration
FROM events AS EVTS
WHERE
EVTS.organizer_id IN(
'f39c3bb1-3ee3-11e6-a0dc-005056c00008',
'4555e70f-3f1d-11e6-a0dc-005056c00008',
'6e7e33ae-3f1c-11e6-a0dc-005056c00008',
'4850a6a0-3ee4-11e6-a0dc-005056c00008',
'e06f784c-3eea-11e6-a0dc-005056c00008',
'bc6a0f73-3f1d-11e6-a0dc-005056c00008',
'68959fb5-3ef3-11e6-a0dc-005056c00008',
'c4c96cf2-3f1a-11e6-a0dc-005056c00008',
'727e49d1-3f1b-11e6-a0dc-005056c00008',
'930bcfb6-3f09-11e6-a0dc-005056c00008')
AND EVTS.type IN('Meeting', 'Conversation')
AND(
EXISTS (
SELECT 1 FROM event_tag_maps AS ETM WHERE ETM.event_id = EVTS.event_id AND
ETM.tag_id IN ('00000000-0000-0000-0000-000000000000', '6ae6870f-1aac-11e6-aeb9-005056c00008', '6ae6870c-1aac-11e6-aeb9-005056c00008', '1f6d3ccb-eaed-4068-a46b-ec2547fec1ff'))
OR NOT EXISTS (
SELECT 1 FROM event_tag_maps AS ETM WHERE ETM.event_id = EVTS.event_id)
)
AND EXISTS (
SELECT 1 FROM event_participants AS EPRTS
WHERE
EVTS.event_id = EPRTS.event_id
AND participant_id NOT IN('79869516-3ef2-11e6-a0dc-005056c00008', '79869515-3ef2-11e6-a0dc-005056c00008', '79869516-4e18-11e6-a0dc-005056c00008')
)
ORDER BY IFNULL(EVTS.start_time, 253402300800) ASC
LIMIT 100 OFFSET #Offset;
Also, for fetching the overall count of the query-matching items, I would use the above query with count(1) instead of the columns and without the ORDER BY and LIMIT/OFFSET clauses.
I experience two main problems here:
1) The performance drastically decreases as I increase the #Offset value. The difference is very significant - from being almost immediate to a number of seconds.
2) The count query takes a long time (number of seconds) and produces the following execution plan:
0|0|0|SCAN TABLE events AS EVTS
0|0|0|EXECUTE LIST SUBQUERY 1
0|0|0|EXECUTE LIST SUBQUERY 1
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE event_tag_maps AS ETM USING COVERING INDEX event_tag_maps_event_id_tag_id_idx (event_id=? AND tag_id=?)
1|0|0|EXECUTE LIST SUBQUERY 2
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 2
2|0|0|SEARCH TABLE event_tag_maps AS ETM USING COVERING INDEX event_tag_maps_event_id_tag_id_idx (event_id=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 3
3|0|0|SEARCH TABLE event_participants AS EPRTS USING INDEX event_participants_event_id_idx (event_id=?)
Here I don't understand why the full scan is performed instead of an index scan.
Additional info and SQLite settings used:
I use System.Data.SQLite provider (have to, because of custom functions support)
Page size = cluster size (4096 in my case)
Cache size = 100000
Journal mode = WAL
Temp store = 2 (memory)
No transaction is open for the query
Is there anything I could do to change the query/schema or settings in order to get as much performance improvement as possible?

sqlite3 explain table_name

In mysql you can view a table's structure via explain tablename; What is the equivalent for sqlite3?
I believe ".schema tablename" is what you're looking for.
You can use .schema in the Command Line Shell:
With no arguments, the ".schema"
command shows the original CREATE
TABLE and CREATE INDEX statements that
were used to build the current
database. If you give the name of a
table to ".schema", it shows the
original CREATE statement used to make
that table and all if its indices.
This was already answered in a more generic way here.
Edit:
Note that .schema will also give you INDEXES that match the same name.
Example:
CREATE TABLE job (
id INTEGER PRIMARY KEY,
data VARCHAR
);
CREATE TABLE job_name (
id INTEGER PRIMARY KEY,
name VARCHAR
);
CREATE INDEX job_idx on job(data);
Note the differences between:
sqlite> SELECT sql FROM SQLITE_MASTER WHERE type = 'table' AND name = 'job';
CREATE TABLE job (
id INTEGER PRIMARY KEY,
data VARCHAR
)
sqlite> SELECT sql FROM SQLITE_MASTER WHERE name = 'job_idx';
CREATE INDEX job_idx on job(data)
and
sqlite> .schema job
CREATE TABLE job (
id INTEGER PRIMARY KEY,
data VARCHAR
);
CREATE INDEX job_idx on job(data);
Including the semi-colon at the end of the queries.