postgres fast check if attribute combination also exists in another table - sql

I want to check if the same two attribute values exist in two different tables. If the combination from table_a is not existing in table_b it should be inserted into the select statement table. Right now I have the following query, which is working:
CREATE TABLE table_a (
attr_a integer,
attr_b text,
uuid character varying(200),
CONSTRAINT table_a_pkey PRIMARY KEY (uuid)
);
CREATE TABLE table_b (
attr_a integer,
attr_b text,
uuid character varying(200),
CONSTRAINT table_b_pkey PRIMARY KEY (uuid)
);
SELECT * FROM table_a
WHERE (table_a.attr_a::text || table_a.attr_b::text) != ALL(SELECT (table_b.attr_a::text || table_b.attr_a::text) FROM table_b)
However, the execution time is pretty long. So I would like to ask if there is a faster solution to check for that.

Your where clause uses a manipulation of attr_a (casting it to text and concatinating with attr_b), so the index can't be used. Instead of this concatination, why not try a straight-forward exists operator?
SELECT *
FROM table_a a
WHERE NOT EXISTS (SELECT *
FROM table_b b
WHERE a.attr_a = b.attr_a AND
b.attr_b = b.attr_b)

Related

On conflict do nothing with a custom constraint

I need to do the following:
insert into table_a (primarykey_field, other_field)
select primarykey_field, other_field from table_b b
on conflict (primarykey_field) where primarykey_field >>= b.primarykey_field do nothing;
Nevermind the operation of my where condition it could be anything except a simple equal. in mycase I'm using a custom ip range field soI I want to check that one ip address is not in the range of the other ip adderss when I'm inserting a new row.
Is there a way I can do this with on conflict or with another query?
You can filter out all rows which have a pkey_ip_range that's already contained by an existing pkey_ip_range:
insert into table_a as a (
pkey_ip_range,
other_field)
select pkey_ip_range,
other_field
from table_b b
where not exists (
select 1
from table_a
where b.pkey_ip_range >>= table_a.pkey_ip_range);
If you wanted to check if the incoming ip range either contains or is contained by the existing ip range (&& rather than >>=), you can use an exclusion constraint:
drop table if exists table_a;
create table table_a (
pkey_ip_range inet primary key,
other_column text);
alter table table_a
add constraint table_a_no_contained_ip_ranges
exclude using gist (pkey_ip_range inet_ops WITH &&);
insert into table_a
(pkey_ip_range,other_column)
values ('192.168.0.0/31','abc');
insert into table_a
(pkey_ip_range,other_column)
values ('192.168.0.0/30','def');
--ERROR: conflicting key value violates exclusion constraint "table_a_no_contained_ip_ranges"
--DETAIL: Key (pkey_ip_range)=(192.168.0.0/30) conflicts with existing key (pkey_ip_range)=(192.168.0.0/31).
insert into table_a
(pkey_ip_range,other_column)
values ('192.168.0.0/32','ghi')
on conflict do nothing;
--table table_a;
-- pkey_ip_range | other_column
------------------+--------------
-- 192.168.0.0/31 | abc
--(1 row)

PostgreSQL: Select dynamic column in correlated subquery

I'm using the Entity-Attribute-Value (EAV) pattern to store 'overrides' for target objects. That is, there are three tables:
Entity, contains the target records
Attribute, contains the column names of 'overridable' columns in the Entity table
Override, contains the EAV records
What I'd like to do is select Overrides along with the value of the 'overridden' column from the Entity table. Thus, requiring dynamic use of the Attribute name in the SQL.
My naive attempt in (PostgreSQL) SQL:
SELECT
OV.entity_id as entity,
AT.name as attribute,
OV.value as value,
ENT.base_value as base_value
FROM "override" AS OV
LEFT JOIN "attribute" as AT
ON (OV.attribute_id = AT.id)
LEFT JOIN LATERAL (
SELECT
id,
AT.name as base_value -- AT.name doesn't resolve to a SQL identifier
FROM "entity"
) AS ENT
ON ENT.id = OV.entity_id;
This doesn't work as AT.name doesn't resolve to a SQL identifier and simply returns column names such as 'col1', 'col2', etc. rather than querying Entity with the column name.
I'm aware this is dynamic SQL but I'm pretty new to PL/pgSQL and couldn't figure out as it is correlated/lateral joined. Plus, is this even possible since the column types are not homogeneously typed? Note all the 'values' in the Override table are stored as strings to get round this problem.
Any help would be most appreciated!
You can use PL/pgSQL to dynamically request the columns. I'm assuming the following simplified database structure (all original and overide values are "character varying" in this example as I didn't find any further type information):
CREATE TABLE public.entity (
id integer NOT NULL DEFAULT nextval('entity_id_seq'::regclass),
attr1 character varying,
attr2 character varying,
<...>
CONSTRAINT entity_pkey PRIMARY KEY (id)
)
CREATE TABLE public.attribute (
id integer NOT NULL DEFAULT nextval('attribute_id_seq'::regclass),
name character varying,
CONSTRAINT attribute_pkey PRIMARY KEY (id)
)
CREATE TABLE public.override (
entity_id integer NOT NULL,
attribute_id integer NOT NULL,
value character varying,
CONSTRAINT override_pkey PRIMARY KEY (entity_id, attribute_id),
CONSTRAINT override_attribute_id_fkey FOREIGN KEY (attribute_id)
REFERENCES public.attribute (id),
CONSTRAINT override_entity_id_fkey FOREIGN KEY (entity_id)
REFERENCES public.entity (id))
With the PL/pgSQL function
create or replace function get_base_value(
entity_id integer,
column_identifier character varying
)
returns setof character varying
language plpgsql as $$
declare
begin
return query execute 'SELECT "' || column_identifier || '" FROM "entity" WHERE "id" = ' || entity_id || ';';
end $$;
you can use almost exactly your query:
SELECT
OV.entity_id as entity,
AT.name as attribute,
OV.value as value,
ENT.get_base_value as base_value
FROM "override" AS OV
LEFT JOIN "attribute" as AT
ON (OV.attribute_id = AT.id)
LEFT JOIN LATERAL (
SELECT id, get_base_value FROM get_base_value(OV.entity_id, AT.name)
) AS ENT
ON ENT.id = OV.entity_id;

Update each row of a table with the corresponding value

I have two Postgres tables:
create table A(
id_A serial not null,
column_A varchar null;
...);
create table B(
id_B serial not null,
id_A int4 not null,
name varchar null,
keywords varchar null,
...);
An element of table A is associated to multiple elements of table B and an element of table B is associated to one element of table A.
The column keywords in table B is a concatenation of values of columns B.name and A.column_A:
B.keywords := B.name || A.column_A
How to update with a trigger the column B.keywords of each row in table B if the value of A.column_A is updated?
In other words, I want to do something like this (pseudo-code):
FOR EACH ROW current_row IN TABLE B
UPDATE B SET keywords = (SELECT B.name || A.column_A
FROM B INNER JOIN A ON B.id_A = A.id_A
WHERE B.id_B = current_row.id_B)
WHERE id_B = current_row.id_B;
Your trigger has to call a function when A is updated:
CREATE OR REPLACE FUNCTION update_b()
RETURNS TRIGGER
AS $$
BEGIN
UPDATE B
SET keywords = name || NEW.column_A
WHERE id_A = NEW.id_A;
return NEW;
END
$$ LANGUAGE plpgsql;
CREATE TRIGGER update_b_trigger AFTER UPDATE OF column_A
ON A
FOR EACH ROW
EXECUTE PROCEDURE update_b();
It might also be useful to add a trigger BEFORE INSERT OR UPDATE on table B to set the keywords.
Your approach is broken by design. Do not try to keep derived values current in the table. That's not safe for concurrent access. All kinds of complications can arise. You bloat table B (and backups) and impair write performance.
Instead, use a VIEW (or a MATERIALIZED VIEW):
CREATE VIEW ab AS
SELECT B.*, concat_ws(', ', B.name, A.column_A) AS keywords
FROM B
LEFT JOIN A USING (id_A);
With the updated table definition below referential integrity is guaranteed and you can use [INNER] JOIN instead of LEFT [OUTER] JOIN.
Or even a simple query might be enough ...
Either way, you need a PRIMARY KEY constraint in table A and a FOREIGN KEY constraint in table B:
CREATE TABLE A (
id_A serial PRIMARY KEY,
column_A varchar
...);
CREATE TABLE B (
id_B serial PRIMARY KEY,
id_A int4 NOT NULL REFERENCES A(id_A),
name varchar
-- and *no* redundant "keywords" column!
...);
About concatenating strings:
How to concatenate columns in a Postgres SELECT?
And I wouldn't use CaMeL-case identifiers:
Are PostgreSQL column names case-sensitive?

Can these three SQLITE INSERTS be combinded or improved?

I have three tables:
CREATE TABLE "local" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL , "serialNumber" TEXT, "location" TEXT)
CREATE TABLE "setups" ("id" INTEGER PRIMARY KEY NOT NULL ,"hold" TEXT,"mode" INTEGER,"setTemp" REAL,"maxSTemp" REAL,"minSTemp" REAL,"units" TEXT,"heat" INTEGER,"heatMode" INTEGER,"fanMode" INTEGER,"fan" INTEGER,"cool" INTEGER)
CREATE TABLE "data" ("id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL ,"humidity" REAL,"time" INTEGER,"filtChng" INTEGER,"indoorTemp" REAL,"outdoorTemp" REAL, "setups_id" INTEGER, "local_id" INTEGER)
Everytime I get a new entry I execute:
INSERT INTO local ('serialNumber') SELECT 'XXXX' WHERE NOT EXISTS (SELECT * FROM local WHERE serialNumber='XXXX')"
INSERT INTO setups ('hold','mode','setTemp','maxSTemp','minSTemp','units','heat','heatMode','fanMode','fan','cool') SELECT '00',1,74.0,74.0,74.0,'F',1,1,1,1,1 WHERE NOT EXISTS (SELECT * FROM setups WHERE hold='00' AND mode=1 AND setTemp=74.0 AND maxSTemp=74.0 AND minSTemp=74.0 AND units='F' AND heat=1 AND heatMode=1 AND fanMode=1 AND fan=1 AND cool=1)
INSERT INTO data ('humidity','filtChng','time','indoorTemp','outdoorTemp',local_id,setups_id) SELECT 74.0,111111111,100,74.0,74.0,local.id,setups.id FROM local CROSS JOIN setups WHERE local.serialNumber='XXXX' AND setups.hold='00' AND setups.mode=1 AND setups.setTemp=74.0 AND setups.maxSTemp=74.0 AND setups.minSTemp=74.0 AND setups.units='F' AND setups.heat=1 AND setups.heatMode=1 AND setups.fanMode=1 AND setups.fan=1 AND setups.cool=1
What I am doing works, but seems slow and redundant/inefficient...
Well, you can remove the "where not exists" part from the "local" insert if you use a unique constraint on the "serialNumber" field. Be careful, this will throw a constraint violation instead of just not inserting the row. So be sure to handle that in the application.
And though I assume it is, be sure that checking for duplicates is really necessary in your app.

Using MySQL's "IN" function where the target is a column?

In a certain TABLE, I have a VARTEXT field which includes comma-separated values of country codes. The field is named cc_list. Typical entries look like the following:
'DE,US,IE,GB'
'IT,CA,US,FR,BE'
Now given a country code, I want to be able to efficiently find which records include that country. Obviously there's no point in indexing this field.
I can do the following
SELECT * from TABLE where cc_list LIKE '%US%';
But this is inefficient.
Since the "IN" function is supposed to be efficient (it bin-sorts the values), I was thinking along the lines of
SELECT * from TABLE where 'US' IN cc_list
But this doesn't work - I think the 2nd operand of IN needs to be a list of values, not a string. Is there a way to convert a CSV string to a list of values?
Any other suggestions? Thanks!
SELECT *
FROM MYTABLE
WHERE FIND_IN_SET('US', cc_list)
In a certain TABLE, I have a VARTEXT field which includes comma-separated values of country codes.
If you want your queries to be efficient, you should create a many-to-many link table:
CREATE TABLE table_country (cc CHAR(2) NOT NULL, tableid INT NOT NULL, PRIMARY KEY (cc, tableid))
SELECT *
FROM tablecountry tc
JOIN mytable t
ON t.id = tc.tableid
WHERE t.cc = 'US'
Alternatively, you can set ft_min_word_len to 2, create a FULLTEXT index on your column and query like this:
CREATE FULLTEXT INDEX fx_mytable_cclist ON mytable (cc_list);
SELECT *
FROM MYTABLE
WHERE MATCH(cc_list) AGAINST('+US' IN BOOLEAN MODE)
This only works for MyISAM tables and the argument should be a literal string (you won't be able to join on this condition).
The first rule of normalization says you should change multi-value columns such as cc_list into a single value field for this very reason.
Preferably into it's own table with IDs for each country code and a pivot table to support a many-to-many relationship.
CREATE TABLE my_table (
my_id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
mystuff VARCHAR NOT NULL,
PRIMARY KEY(my_id)
);
# this is the pivot table
CREATE TABLE my_table_countries (
my_id INT(11) UNSIGNED NOT NULL,
country_id SMALLINT(5) UNSIGNED NOT NULL,
PRIMARY KEY(my_id, country_id)
);
CREATE TABLE countries {
country_id SMALLINT(5) UNSIGNED NOT NULL AUTO_INCREMENT,
country_code CHAR(2) NOT NULL,
country_name VARCHAR(100) NOT NULL,
PRIMARY KEY (country_id)
);
Then you can query it making use of indexes:
SELECT * FROM my_table JOIN my_table_countries USING (my_id) JOIN countries USING (country_id) WHERE country_code = 'DE'
SELECT * FROM my_table JOIN my_table_countries USING (my_id) JOIN countries USING (country_id) WHERE country_code IN('DE','US')
You may have to group the results my my_id.
find_in_set seems to be the MySql function you want. If you could actually store those comma-separated strings as MySql sets (no more than 64 possible countries, or splitting countries into two groups of no more than 64 each), you could keep using find_in_set and go a bit faster.
There's no efficient way to find what you want. A table scan will be necessary. Putting multiple values into a single text field is a terrible misuse of relational database technology. If you refactor (if you have access to the database structure) so that the country codes are properly stored in a separate table you will be able to easily and quickly retrieve the data you want.
One approach that I've used successfully before (not on mysql, though) is to place a trigger on the table that splits the values (based on a specific delimiter) into discrete values, inserting them into a sub-table. Your select can then look like this:
SELECT * from TABLE where cc_list IN
(
select cc_list_name from cc_list_subtable
where c_list_subtable.table_id = TABLE.id
)
where the trigger parses cc_list in TABLE into separate entries in column cc_list_name in table cc_list_subtable. It involves a bit of work in the trigger, too, as every change to TABLE means that associated rows in cc_list_table have to be deleted/updated/inserted as appropriate, but is an approach that works in situations where the original table TABLE has to retain its original structure, but where you are free to adapt the query as you see fit.