Oracle Data Migration [data modification] - Data Tuning - sql

I'm facing with a data migration, my goal is update 2.5M of row in less than 8 hours, that's because the customer have a limited window of time where the service can be deactivated. Moreover the table can't be locked during this execution because is used by other procedures, I can lock the record only. The execution will done through batch process.
Probably in this case migration isn't the correct word, could be better say "altering data"...
System: Oracle 11g
Table Informations
Table name: Tab1
Tot rows: 520.000.000
AVG row len: 57
DESC Tab1;
Name Null? Type
---------------- -------- -----------
t_id NOT NULL NUMBER
t_fk1_id NUMBER
t_fk2_id NUMBER
t_start_date NOT NULL DATE
t_end_date DATE
t_del_flag NOT NULL NUMBER(1)
t_flag1 NOT NULL NUMBER(1)
f_falg2 NOT NULL NUMBER(1)
t_creation_date DATE
t_creation_user NUMBER(10)
t_last_update DATE
t_user_update NUMBER(10)
t_flag3 NUMBER(1)
Indexs are:
T_ID_PK [t_id] UNIQUE
T_IN_1 [t_fk2_id,t_fk1_id,t_start_date,t_del_flag] NONUNIQUE
T_IN_2 [t_last_update,t_fk2_id] NONUNIQUE
T_IN_3 [t_fk2_id,t_fk1_id] NONUNIQUE
Currently I've thinked some possible solutions and most of that I've already test:
Insert + delete: selecting the existing data, insert the new record with needed modification and delete the old one [this result as slowest method ~21h]
Merge: use the merge command for update the existing data [this result as the fastest method ~16h]
Update: update the existing data [~18h]
With the above solution I've faced some issues like: if executed wit /*+ parallel(x) / option the table was locked, the /+ RESULT_CACHE */ seem not affect at all the selection time.
My last idea is partition the table by a new column and use that for avoid table locking and proceed with the solution 1.
Here the query used for Merge option (for the others two is the same more or less):
DECLARE
v_recordset NUMBER;
v_row_count NUMBER;
v_start_subset NUMBER;
v_tot_loops NUMBER;
BEGIN
--set the values manually for example purpose, I've use the same values
v_recordset := 10000;
v_tot_loops := 10000;
BEGIN
SELECT NVL(MIN(MOD(m_id,v_recordset)), 99999)
INTO v_start_subset
FROM MIGRATION_TABLE
WHERE m_status = 0; -- 0=not migrated , 1=migrated
END;
FOR v_n_subset IN v_start_subset..v_tot_loops
LOOP
BEGIN
MERGE INTO Tab1 T1
USING (
SELECT m.m_new_id, c2.c_id, t.t_id
FROM MIGRATION_TABLE m
JOIN Tab1 t ON t.t_fk_id = m.m_old_id
JOIN ChildTable c ON c.c_id = t.t_fk2_id
JOIN ChildTable c2 ON c.c_name = c2.c_name --c_name is an UNIQUE index of ChildTable
WHERE MOD(m.m_id,v_recordset) = v_n_subset
AND c.c_fk_id = old_product_id --value obtained from another subsystem
AND c2.c_fk_id = new_product_id --value obtained from another subsystem
AND t.t_del_flag = 0 --not deleted items
) T2
ON (T1.t_id = T2.t_id)
WHEN MATCHED THEN
UPDATE T1.t_fk_id = T2.m_new_id, T1.t_fk2_id = T2.c_id, T1.t_last_update = trunc(sysdate)
;
--Update the record as migrated and proceed
COMMIT;
EXCEPTION WHEN OTHERS THEN
ROLLBACK;
END;
END LOOP;
END;
In the above script I've deleted the parallel and cache options but I've already test is with both and I've not obtained any bug result.
Anyone, please! Could you guys help me with this, in more than one week I wasn't able to reach the desired timing, any ideas?
MIGRATION_TABLE
CREATE TABLE MIGRATION_TABLE(
m_customer_from VARCHAR2(5 BYTE),
m_customer_to VARCHAR2(5 BYTE),
m_old_id NUMBER(10,0) NOT NULL,
m_new_id NUMBER(10,0) NOT NULL,
m_status VARCHAR2(100 BYTE),
CONSTRAINT M_MIG_PK_1
(
m_old_id
)
ENABLE
)
CREATE UNIQUE INDEX M_MIG_PK_1 ON MIGRATION_TABLE (m_old_id ASC)
ChildTable
CREATE TABLE ChildTable(
c_id NUMBER(10, 0) NOTE NULL,
c_fk_id NUMBER(10, 0),
c_name VARCHAR2(100 BYTE),
c_date DATE,
c_note VARCHAR2(100 BYTE),
CONSTRAINT C_CT_PK_1
(
c_id
)
ENABLE
)
CREATE UNIQUE INDEX C_CT_PK_1 ON ChildTable (c_id ASC)
CREATE UNIQUE INDEX C_CT_PK_2 ON ChildTable (c_name ASC, c_fk_id ASC)

Method 2 is similar to Method 1, but it is using ROWIDs instead of a primary key. In theory, it should be therefore a bit faster.
CREATE TABLE migration_temp NOLOGGING AS
SELECT t.t_id,
t.rowid AS rid,
m.m_new_id AS new_fk1_id,
c2.c_id AS new_fk2_id
FROM MIGRATION_TABLE m
JOIN Tab1 t ON t.t_fk1_id = m.m_old_id
JOIN ChildTable c1 ON c1.c_id = t.t_fk2_id
JOIN ChildTable c2 ON c1.c_name = c2.c_name
WHERE t.t_del_flag = 0
ORDER BY t.rowid;
EXEC DBMS_STATS.GATHER_TABLE_STATS(null,'migration_temp');
MERGE INTO Tab1 t USING migration_temp m ON (t.rowid = m.rid)
WHEN MATCHED THEN UPDATE SET
t.t_fk1_id = m.new_fk1_id,
t.t_fk2_id = m.new_fk2_id,
t.t_last_update = trunc(sysdate);
You could think of batching the MERGE based on blocks of ROWIDs. Those tend to be logically colocated, therefore it should be a bit faster.

Wow, 520 million rows! However, updating 2.5 million of them is only 0.5%, that should be doable. Not knowing your data, my first assumption is that the self join of Tab1 x Tab1 inside the MERGE takes up most of the time. Possibly also the many joins to migration- and child_tables. And the indexes T_IN_1, 2, and 3 need maintenance, too.
As you say the rows to be updated are fixed, I'd try to prepare the heavy work. This doesn't lock the table and wouldn't count towards the downtime:
CREATE TABLE migration_temp NOLOGGING AS
SELECT t.t_id,
m.m_new_id AS new_fk1_id,
c2.c_id AS new_fk2_id
FROM MIGRATION_TABLE m
JOIN Tab1 t ON t.t_fk1_id = m.m_old_id
JOIN ChildTable c1 ON c1.c_id = t.t_fk2_id
JOIN ChildTable c2 ON c1.c_name = c2.c_name
WHERE t.t_del_flag = 0;
I omitted the bit with the old/new product_ids because I didn't fully understand how it should work, but that is hopefully not a problem.
Method 1 would be a join via primary keys:
ALTER TABLE migration_temp ADD CONSTRAINT pk_migration_temp PRIMARY KEY(t_id);
EXEC DBMS_STATS.GATHER_TABLE_STATS(null,'migration_temp');
MERGE INTO Tab1 t USING migration_temp m ON (t.t_id = m.t_id)
WHEN MATCHED THEN UPDATE SET
t.t_fk1_id = m.new_fk1_id,
t.t_fk2_id = m.new_fk2_id,
t.t_last_update = trunc(sysdate);
I'm not a fan of batched updates. As you have time estimates, it looks like you have a test system. I'd suggest to give it a go and try it in one batch.

If method 1 and 2 are still too slow, you could follow your partitioning idea. For instance, introduce a column to distinguish the rows to be migrated. Because of DEFAULT ... NOT NULL this will be very fast:
ALTER TABLE Tab1 ADD (todo NUMBER DEFAULT 0 NOT NULL);
Now partition your table into two partions, one with the migration data, one with the rest that you will not touch. I don't have much experience with introducing partitions while the application is running, but I think it is solvable, for instance with online redefinition or
ALTER TABLE Tab1 MODIFY
PARTITION BY LIST (todo) (
PARTITION pdonttouch VALUES (0),
PARTITION pmigration VALUES (1)
) ONLINE UPDATE INDEXES (
T_ID_PK GLOBAL, T_IN_1 GLOBAL,
T_IN_2 GLOBAL, T_IN_3 GLOBAL
);
Now you can identify the rows to be moved. This can be done row by row and doesn't affect the other processes and should not count towards your downtime. The migration rows will move from partition pdonttouch to partition pmigration therefore you need to enable row movement.
ALTER TABLE Tab1 ENABLE ROW MOVEMENT;
UPDATE Tab1 SET todo=1 WHERE .... JOIN ...;
Now you can work on the partition PMIGRATION and update the data there. This should be much faster than on the original table, as the size of the partition is only 0.5% of the whole table. Don't know about the indexes, though.
Theoretically, you could create a table with the same structure and data as PMIGRATION, work on the table, and once done, swap the partition and the working table with EXCHANGE PARTITION. Don't know about the indexes, again.

Related

Delete on Oracle DB when joining with a piplined table function

I have a table called CUSTOMERS from which I want to delete all entries that are not present in VALUE_CUSTOMERS. VALUE_CUSTOMERS is a Pipelined Table Function which builds on customers.
DELETE
(
SELECT *
FROM CUSTOMERS
LEFT JOIN
(
SELECT
1 AS DELETABLE,
VC.*
FROM
(
CUSTOMER_PACKAGE.VALUE_CUSTOMERS(TRUNC(SYSDATE) - 30)) VC
)
USING
(
FIRST_NAME, LAST_NAME, DATE_OF_BIRTH
)
WHERE
DELETABLE IS NULL
)
;
When I try to execute the statement, I get the Error:
ORA-01752: cannot delete from view without exactly one key-preserved
table
It looks like you're example has wrong syntax (lack of table keyword) - I've tested it on Oracle 12c, so maybe it works on newer ones. Below some ideas - based on Oracle 12c.
You've got multiple options here:
Save result of CUSTOMER_PACKAGE.VALUE_CUSTOMERS to some temporary table and use it in your query
CREATE GLOBAL TEMPORARY TABLE TMP_CUSTOMERS (
FIRST_NAME <type based on CUSTOMERS.FIRST_NAME>
, LAST_NAME <type based on CUSTOMERS.LAST_NAME>
, DATE_OF_BIRTH <type based on CUSTOMERS.DATE_OF_BIRTH>
)
ON COMMIT DELETE ROWS;
Then in code:
INSERT INTO TMP_CUSTOMERS(FIRST_NAME, LAST_NAME, DATE_OF_BIRTH)
SELECT VC.FIRST_NAME, VC.LAST_NAME, VC.DATE_OF_BIRTH
FROM TABLE(CUSTOMER_PACKAGE.VALUE_CUSTOMERS(TRUNC(SYSDATE) - 30)) VC
;
-- and then:
DELETE FROM CUSTOMERS C
WHERE NOT EXISTS(
SELECT 1
FROM TMP_CUSTOMERS TMP_C
-- Be AWARE that NULLs are not handled here
-- so it's correct only if FIRST_NAME, LAST_NAME, DATE_OF_BIRTH are not nullable
WHERE C.FIRST_NAME = TMP_C.FIRST_NAME
AND C.LAST_NAME = TMP_C.LAST_NAME
AND C.DATE_OF_BIRTH = TMP_C.DATE_OF_BIRTH
)
;
-- If `CUSTOMER_PACKAGE.VALUE_CUSTOMERS` can return a lot of rows,
-- then you should create indexes on FIRST_NAME, LAST_NAME, DATE_OF_BIRTH
-- or maybe even 1 multi-column index on all of above columns.
Also, consider rewriting your query. You should insert into TMP_CUSTOMERS just a customer ID and then call a delete based on these ids.
The main risk here is that the data could be changed between these 2 operations and you should consider that issue.
You can save result in collection variable and then do a bulk delete with forall loop.
If number of rows to delete could be big, then you should extend this example using LIMIT clause. Even with limit, you still could encounter some problems - like not enough space in UNDO. So this solution is good only for small amount of data. The risk here is the same as in example above.
DECALRE
type t_tab is table of number index by pls_integer;
v_tab t_tab;
BEGIN
SELECT CUSTOMER.ID -- I hope you have some kind of Primary key there...
BULK COLLECT INTO v_tab
FROM CUSTOMERS
LEFT JOIN
(
SELECT
1 AS DELETABLE,
VC.*
FROM TABLE(CUSTOMER_PACKAGE.VALUE_CUSTOMERS(TRUNC(SYSDATE) - 30)) VC
)
USING
(
FIRST_NAME, LAST_NAME, DATE_OF_BIRTH
)
WHERE
DELETABLE IS NULL
;
FORALL idx in 1..V_TAB.COUNT()
DELETE FROM CUSTOMERS
WHERE CUSTOMERS.ID = V_TAB(idx)
;
END;
/
Do it completely different - that's preferable.
For example: logic from CUSTOMER_PACKAGE.VALUE_CUSTOMERS move to a view and based on that view create a delete statement. Remember to change CUSTOMER_PACKAGE.VALUE_CUSTOMERS to use that new view also (DRY principle).

Is it possible that nested loop joins different data to same id in different loops

We have an interesting phenomenon with a sql and the oracle database that we could not reproduce. The example was simplified. We believe not, but possibly oversimplified.
Main question: Given a nested loop, where the inner (not driving) table has an analytic function, whose result is ambiguous (multiple rows could be the first row of the order by), would it be feasible that said analytic function can return different results for different outer loops?
Secondary Question: If yes, how can we reproduce this behaviour?
If no, have you any other ideas why this query would produce multiple rows for the same company.
Not the question: Should the assumption on what is wrong be correct, correcting the sql would be easy. Just make the order by in the analytic function unambiguous e.g. by adding the id column as second criteria.
Problem:
Company has a n:m relation to owner and a 1:n relation to address.
The SQL joins all tables while reading only a single address per company making use of the analytic function row_number(), groups by company AND address and accumulates the owner name.
We use the query for multiple purposes, other purposes involve reading the “best” address, the problematic one does not. We got multiple error reports with results like this:
Company A has owners N1, N2, N3.
Result was
Company
Owner list
A
N1
A
N2, N3
All cases that were reported involve companies with multiple “best” addresses, hence the theory, that somehow the subquery that should deliver a single address is broken. But we could not reproduce the result.
Full Details:
(for smaller numbers the listagg() is the original function used, but it fails for bigger numbers. count(*) should be a suitable replacement)
--cleanup
DROP TABLE rau_companyowner;
DROP TABLE rau_owner;
DROP TABLE rau_address;
DROP TABLE rau_company;
--create structure
CREATE TABLE rau_company (
id NUMBER CONSTRAINT pk_rau_company PRIMARY KEY USING INDEX (CREATE UNIQUE INDEX idx_rau_company_p ON rau_company(id))
);
CREATE TABLE rau_owner (
id NUMBER CONSTRAINT pk_rau_owner PRIMARY KEY USING INDEX (CREATE UNIQUE INDEX idx_rau_owner_p ON rau_owner(id)),
name varchar2(1000)
);
CREATE TABLE rau_companyowner (
company_id NUMBER,
owner_id NUMBER,
CONSTRAINT pk_rau_companyowner PRIMARY KEY (company_id, owner_id) USING INDEX (CREATE UNIQUE INDEX idx_rau_companyowner_p ON rau_companyowner(company_id, owner_id)),
CONSTRAINT fk_companyowner_company FOREIGN KEY (company_id) REFERENCES rau_company(id),
CONSTRAINT fk_companyowner_owner FOREIGN KEY (owner_id) REFERENCES rau_owner(id)
);
CREATE TABLE rau_address (
id NUMBER CONSTRAINT pk_rau_address PRIMARY KEY USING INDEX (CREATE UNIQUE INDEX idx_rau_address_p ON rau_address(id)),
company_id NUMBER,
prio NUMBER NOT NULL,
street varchar2(1000),
CONSTRAINT fk_address_company FOREIGN KEY (company_id) REFERENCES rau_company(id)
);
--create testdata
DECLARE
TYPE t_address IS TABLE OF rau_address%rowtype INDEX BY pls_integer;
address t_address;
TYPE t_owner IS TABLE OF rau_owner%rowtype INDEX BY pls_integer;
owner t_owner;
TYPE t_companyowner IS TABLE OF rau_companyowner%rowtype INDEX BY pls_integer;
companyowner t_companyowner;
ii pls_integer;
company_id pls_integer := 1;
test_count PLS_INTEGER := 10000;
--test_count PLS_INTEGER := 50;
BEGIN
--rau_company
INSERT INTO rau_company VALUES (company_id);
--rau_owner,rau_companyowner
FOR ii IN 1 .. test_count
LOOP
owner(ii).id:=ii;
owner(ii).name:='N'||to_char(ii);
companyowner(ii).company_id:=company_id;
companyowner(ii).owner_id:=ii;
END LOOP;
forall ii IN owner.FIRST .. owner.LAST
INSERT INTO rau_owner VALUES (owner(ii).id, owner(ii).name);
forall ii IN companyowner.FIRST .. companyowner.LAST
INSERT INTO rau_companyowner VALUES (companyowner(ii).company_id, companyowner(ii).owner_id);
--rau_address
FOR ii IN 1 .. test_count
LOOP
address(ii).id:=ii;
address(ii).company_id:=company_id;
address(ii).prio:=1;
address(ii).street:='S'||to_char(ii);
END LOOP;
forall ii IN address.FIRST .. address.LAST
INSERT INTO rau_address VALUES (address(ii).id, address(ii).company_id, address(ii).prio, address(ii).street);
COMMIT;
END;
-- check testdata
SELECT 'rau_company' tab, COUNT(*) count FROM rau_company
UNION all
SELECT 'rau_owner', COUNT(*) FROM rau_owner
UNION all
SELECT 'rau_companyowner', COUNT(*) FROM rau_companyowner
UNION all
SELECT 'rau_address', COUNT(*) FROM rau_address;
-- the sql: NL with address as inner loop enforced
-- ‘order BY prio’ is ambiguous because all addresses have the same prio
-- => the single row in ad could be any row
SELECT /*+ leading(hh hhoo oo ad) use_hash(hhoo oo) USE_NL(hh ad) */
hh.id company,
ad.street,
-- LISTAGG(oo.name || ', ') within group (order by oo.name) owner_list,
count(oo.id) owner_count
FROM rau_company hh
LEFT JOIN rau_companyowner hhoo ON hh.id = hhoo.company_id
LEFT JOIN rau_owner oo ON hhoo.owner_id = oo.id
LEFT JOIN (
SELECT *
FROM (
SELECT company_id, street,
row_number() over ( partition by company_id order BY prio asc ) as row_num
FROM rau_address
)
WHERE row_num = 1
) ad ON hh.id = ad.company_id
GROUP BY hh.id,
ad.street;
Cris Saxon was so nice to answer my question: https://asktom.oracle.com/pls/apex/f?p=100:11:::::P11_QUESTION_ID:9546263400346154452
In short: As long as the order by is ambiguous (non-deterministic), there will always be a chance for different results even within the same sql.
To reproduce add this to my test data:
ALTER TABLE rau_address PARALLEL 8;
and try the select at the bottom, it should deliver multiple rows.

mariadb not using all fields of composite index

Mariadb not fully using composite index. Fast select and slow select both return same data, but explain shows that slow select uses only ix_test_relation.entity_id part and does not use ix_test_relation.stamp part.
I tried many cases (inner join, with, from) but couldn't make mariadb use both fields of index together with recursive query. I understand that I need to tell mariadb to materialize recursive query somehow.
Please help me optimize slow select which is using recursive query to be similar speed to fast select.
Some details about the task... I need to query user activity. One user activity record may relate to multiple entities. Entities are hierarchical. I need to query user activity for some parent entity and all children for specified stamp range. Stamp simplified from TIMESTAMP to BIGINT for demonstration simplicity. There can be a lot (1mil) of entities and each entity may relate to a lot (1mil) of user activity entries. Entity hierarchy depth expected to be like 10 levels deep. I assume that used stamp range reduces number of user activity records to 10-100. I denormalized schema, copied stamp from test_entry to test_relation to be able to include it in test_relation index.
I use 10.4.11-Mariadb-1:10:4.11+maria~bionic.
I can upgrade or patch or whatever mariadb if needed, I have full control over building docker image.
Schema:
CREATE TABLE test_entity(
id BIGINT NOT NULL,
parent_id BIGINT NULL,
CONSTRAINT pk_test_entity PRIMARY KEY (id),
CONSTRAINT fk_test_entity_pid FOREIGN KEY (parent_id) REFERENCES test_entity(id)
);
CREATE TABLE test_entry(
id BIGINT NOT NULL,
name VARCHAR(100) NOT NULL,
stamp BIGINT NOT NULL,
CONSTRAINT pk_test_entry PRIMARY KEY (id)
);
CREATE TABLE test_relation(
entry_id BIGINT NOT NULL,
entity_id BIGINT NOT NULL,
stamp BIGINT NOT NULL,
CONSTRAINT pk_test_relation PRIMARY KEY (entry_id, entity_id),
CONSTRAINT fk_test_relation_erid FOREIGN KEY (entry_id) REFERENCES test_entry(id),
CONSTRAINT fk_test_relation_enid FOREIGN KEY (entity_id) REFERENCES test_entity(id)
);
CREATE INDEX ix_test_relation ON test_relation(entity_id, stamp);
CREATE SEQUENCE sq_test_entry;
Test data:
CREATE OR REPLACE PROCEDURE test_insert()
BEGIN
DECLARE v_entry_id BIGINT;
DECLARE v_parent_entity_id BIGINT;
DECLARE v_child_entity_id BIGINT;
FOR i IN 1..1000 DO
SET v_parent_entity_id = i * 2;
SET v_child_entity_id = i * 2 + 1;
INSERT INTO test_entity(id, parent_id)
VALUES(v_parent_entity_id, NULL);
INSERT INTO test_entity(id, parent_id)
VALUES(v_child_entity_id, v_parent_entity_id);
FOR j IN 1..1000000 DO
SELECT NEXT VALUE FOR sq_test_entry
INTO v_entry_id;
INSERT INTO test_entry(id, name, stamp)
VALUES(v_entry_id, CONCAT('entry ', v_entry_id), j);
INSERT INTO test_relation(entry_id, entity_id, stamp)
VALUES(v_entry_id, v_parent_entity_id, j);
INSERT INTO test_relation(entry_id, entity_id, stamp)
VALUES(v_entry_id, v_child_entity_id, j);
END FOR;
END FOR;
END;
CALL test_insert;
Slow select (> 100ms):
SELECT entry_id
FROM test_relation TR
WHERE TR.entity_id IN (
WITH RECURSIVE recursive_child AS (
SELECT id
FROM test_entity
WHERE id IN (2, 4)
UNION ALL
SELECT C.id
FROM test_entity C
INNER JOIN recursive_child P
ON P.id = C.parent_id
)
SELECT id
FROM recursive_child
)
AND TR.stamp BETWEEN 6 AND 8
Fast select (1-2ms):
SELECT entry_id
FROM test_relation TR
WHERE TR.entity_id IN (2,3,4,5)
AND TR.stamp BETWEEN 6 AND 8
UPDATE 1
I can demonstrate the problem with even shorter example.
Explicitly store required entity_id records in temporary table
CREATE OR REPLACE TEMPORARY TABLE tbl
WITH RECURSIVE recursive_child AS (
SELECT id
FROM test_entity
WHERE id IN (2, 4)
UNION ALL
SELECT C.id
FROM test_entity C
INNER JOIN recursive_child P
ON P.id = C.parent_id
)
SELECT id
FROM recursive_child
Try to run select using temporary table (below). Select is still slow but the only difference with fast query now is that IN statement queries table instead of inline constants.
SELECT entry_id
FROM test_relation TR
WHERE TR.entity_id IN (SELECT id FROM tbl)
AND TR.stamp BETWEEN 6 AND 8
For your queries (both of them) it looks to me like you should, as you mentioned, flip the column order on your compound index:
CREATE INDEX ix_test_relation ON test_relation(stamp, entity_id);
Why?
Your queries have a range filter TR.stamp BETWEEN 2 AND 3 on that column. For a range filter to use an index range scan (whether on a TIMESTAMP or a BIGINT column), the column being filtered must be first in a multicolumn index.
You also want a sargable filter, that is something lik this:
TR.stamp >= CURDATE() - INTERVAL 7 DAY
AND TR.stamp < CURDATE()
in place of
DATE(TR.stamp) BETWEEN DATE(NOW() - INTERVAL 7 DAY) AND DATE(NOW())
That is, don't put a function on the column you're scanning in your WHERE clause.
With a structured query like your first one, the query planner turns it into several queries. You can see this with ANALYZE FORMAT=JSON. The planner may choose different indexes and/or different chunks of indexes for each of those subqueries.
And, a word to the wise: don't get too wrapped around the axle trying to outguess the query planner built into the DBMS. It's an extraordinarily complex and highly wrought piece of software, created by decades of programming work by world-class experts in optimization. Our job as MariaDB / MySQL users is to find the right indexes.
The order of columns in a composite index matters. (O.Jones explains it nicely -- using SQL that has been removed from the Question?!)
I would rewrite
SELECT entry_id
FROM test_relation TR
WHERE TR.entity_id IN (SELECT id FROM tbl)
AND TR.stamp BETWEEN 6 AND 8
as
SELECT TR.entry_id
FROM tbl
JOIN test_relation TR ON tbl.id = TR.entity_id
WHERE TR.stamp BETWEEN 6 AND 8
or
SELECT entry_id
FROM test_relation TR
WHERE TR.stamp BETWEEN 6 AND 8
AND EXISTS ( SELECT 1 FROM tbl
WHERE tbl.id = TR.entity_id )
And have these in either case:
TR: INDEX(stamp, entity_id, entry_id) -- With `stamp` first
tbl: INDEX(id) -- maybe
Since tbl is a freshly built TEMPORARY TABLE, and it seems that only 3 rows need checking, it may not be worth adding INDEX(id).
Also needed:
test_entity: INDEX(parent_id, id)
Assuming that test_relation is a many:many mapping table, it is likely that you will also need (though not necessarily for the current query):
INDEX(entity_id, entry_id)

Why PostgreSQL CTE with DELETE is not working?

I was trying to delete a record from my stock table if the update in the same table results in quantity 0 using two CTEs.
The upserts are working, but the delete is not generating the result I was expecting. the quantity in stock table is changing to zero but the record is not being deleted.
Table structure:
CREATE TABLE IF NOT EXISTS stock_location (
stock_location_id SERIAL
, site_code VARCHAR(10) NOT NULL
, location_code VARCHAR(50) NOT NULL
, status CHAR(1) NOT NULL DEFAULT 'A'
, CONSTRAINT pk_stock_location PRIMARY KEY (stock_location_id)
, CONSTRAINT ui_stock_location__keys UNIQUE (site_code, location_code)
);
CREATE TABLE IF NOT EXISTS stock (
stock_id SERIAL
, stock_location_id INT NOT NULL
, item_code VARCHAR(50) NOT NULL
, quantity FLOAT NOT NULL
, CONSTRAINT pk_stock PRIMARY KEY (stock_id)
, CONSTRAINT ui_stock__keys UNIQUE (stock_location_id, item_code)
, CONSTRAINT fk_stock__stock_location FOREIGN KEY (stock_location_id)
REFERENCES stock_location (stock_location_id)
ON DELETE CASCADE ON UPDATE CASCADE
);
This is how the statement looks like:
WITH stock_location_upsert AS (
INSERT INTO stock_location (
site_code
, location_code
, status
) VALUES (
inSiteCode
, inLocationCode
, inStatus
)
ON CONFLICT ON CONSTRAINT ui_stock_location__keys
DO UPDATE SET
status = inStatus
RETURNING stock_location_id
)
, stock_upsert AS (
INSERT INTO stock (
stock_location_id
, item_code
, quantity
)
SELECT
slo.stock_location_id
, inItemCode
, inQuantity
FROM stock_location_upsert slo
ON CONFLICT ON CONSTRAINT ui_stock__keys
DO UPDATE SET
quantity = stock.quantity + inQuantity
RETURNING stock_id, quantity
)
DELETE FROM stock stk
USING stock_upsert stk2
WHERE stk.stock_id = stk2.stock_id
AND stk.quantity = 0;
Does anyone know what's going on?
This is an example of what I'm trying to do:
DROP TABLE IF EXISTS test1;
CREATE TABLE IF NOT EXISTS test1 (
id serial
, code VARCHAR(10) NOT NULL
, description VARCHAR(100) NOT NULL
, quantity INT NOT NULL
, CONSTRAINT pk_test1 PRIMARY KEY (id)
, CONSTRAINT ui_test1 UNIQUE (code)
);
-- UPSERT
WITH test1_upsert AS (
INSERT INTO test1 (
code, description, quantity
) VALUES (
'01', 'DESC 01', 1
)
ON CONFLICT ON CONSTRAINT ui_test1
DO UPDATE SET
description = 'DESC 02'
, quantity = 0
RETURNING test1.id, test1.quantity
)
DELETE FROM test1
USING test1_upsert
WHERE test1.id = test1_upsert.id
AND test1_upsert.quantity = 0;
The second time the UPSERT command runs, it should delete the record from test1 once the quantity will be updated to zero.
Makes sense?
Here, DELETE is working in the way it was designed to work. The answer is actually pretty straightforward and documented. I've experienced the same behaviour years ago.
The reason your delete is not actually removing the data is because your where condition doesn't match with what's stored inside the table as far as what the delete statement sees.
All sub-statements within CTE (Common Table Expression) are executed with the same snapshot of data, so they can't see other statement effect on target table. In this case, when you run UPDATE and then DELETE, the DELETE statement sees the same data that UPDATE did, and doesn't see the updated data that UPDATE statement modified.
How can you work around that? You need to separate UPDATE & DELETE into two independent statements.
In case you need to pass the information about what to delete you could for example (1) create a temporary table and insert the data primary key that has been updated so that you can join to that in your latter query (DELETE based on data that was UPDATEd). (2) You could achieve the same result by simply adding a column within the updated table and changing its value to mark updated rows or (3) however you like it to get the job done. You should get the feeling of what needs to be done by above examples.
Quoting the manual to support my findings:
7.8.2. Data-Modifying Statements in WITH
The sub-statements in WITH are executed concurrently with each other
and with the main query. Therefore, when using data-modifying
statements in WITH, the order in which the specified updates actually
happen is unpredictable. All the statements are executed with the same
snapshot (see Chapter 13), so they cannot “see” one another's effects
on the target tables.
(...)
This also applies to deleting a row that was already updated in the same statement: only the update is performed
Adding to the helpful explanation above... Whenever possible it is absolutely best to break out modifying procedures into their own statements.
However, when the CTE has multiple modifying procedures that reference the same subquery and temporary tables are unideal (such as in stored procedures) then you just need a good solution.
In that case if you'd like a simple trick about how to go about ensuring a bit of order, consider this example:
WITH
to_insert AS
(
SELECT
*
FROM new_values
)
, first AS
(
DELETE FROM some_table
WHERE
id in (SELECT id FROM to_insert)
RETURNING *
)
INSERT INTO some_other_table
SELECT * FROM new_values
WHERE
exists (SELECT count(*) FROM first)
;
The trick here is the exists (SELECT count(*) FROM first) part which must be executed first before the insert can happen. This is a way (which I wouldn't consider too hacky) to enforce an order while keeping everything within one CTE.
But this is just the concept - there are more optimal ways of doing the same thing for a given context.

Create a field in Firebird which displays data from another table

I didn't find a working solution for creating a "lookup column" in a Firebird database.
Here is an example:
Table1: Orders
[OrderID] [CustomerID] [CustomerName]
Table2: Customers
[ID] [Name]
When I run SELECT * FROM ORDERS I want to get OrderID, CustomerID and CustomerName....but CustomerName should automatically be computed by looking for the "CustomerID" in the "ID" column of "Customer" Table, returning the content of the "Name" column.
Firebird has calculated fields (generated always as/computed by), and these allow selecting from other tables (contrary to an earlier version of this answer, which stated that Firebird doesn't support this).
However, I suggest you use a view instead, as I think it performs better (haven't verified this, so I suggest you test this if performance is important).
Use a view
The common way would be to define a base table and an accompanying view that gathers the necessary data at query time. Instead of using the base table, people would query from the view.
create view order_with_customer
as
select orders.id, orders.customer_id, customer.name
from orders
inner join customer on customer.id = orders.customer_id;
Or you could just skip the view and use above join in your own queries.
Alternative: calculated fields
I label this as an alternative and not the main solution, as I think using a view would be the preferable solution.
To use calculated fields, you can use the following syntax (note the double parentheses around the query):
create table orders (
id integer generated by default as identity primary key,
customer_id integer not null references customer(id),
customer_name generated always as ((select name from customer where id = customer_id))
)
Updates to the customer table will be automatically reflected in the orders table.
As far as I'm aware, the performance of this option is less than when using a join (as used in the view example), but you might want to test that for yourself.
FB3+ with function
With Firebird 3, you can also create calculated fields using a trigger, this makes the expression itself shorter.
To do this, create a function that selects from the customer table:
create function lookup_customer_name(customer_id integer)
returns varchar(50)
as
begin
return (select name from customer where id = :customer_id);
end
And then create the table as:
create table orders (
id integer generated by default as identity primary key,
customer_id integer not null references customer(id),
customer_name generated always as (lookup_customer_name(customer_id))
);
Updates to the customer table will be automatically reflected in the orders table. This solution can be relatively slow when selecting a lot of records, as the function will be executed for each row individually, which is a lot less efficient than performing a join.
Alternative: use a trigger
However if you want to update the table at insert (or update) time with information from another table, you could use a trigger.
I'll be using Firebird 3 for my answer, but it should translate - with some minor differences - to earlier versions as well.
So assuming a table customer:
create table customer (
id integer generated by default as identity primary key,
name varchar(50) not null
);
with sample data:
insert into customer(name) values ('name1');
insert into customer(name) values ('name2');
And a table orders:
create table orders (
id integer generated by default as identity primary key,
customer_id integer not null references customer(id),
customer_name varchar(50) not null
)
You then define a trigger:
create trigger orders_bi_bu
active before insert or update
on orders
as
begin
new.customer_name = (select name from customer where id = new.customer_id);
end
Now when we use:
insert into orders(customer_id) values (1);
the result is:
id customer_id customer_name
1 1 name1
Update:
update orders set customer_id = 2 where id = 1;
Result:
id customer_id customer_name
1 2 name2
The downside of a trigger is that updating the name in the customer table will not automatically be reflected in the orders table. You would need to keep track of these dependencies yourself, and create an after update trigger on customer that updates the dependent records, which can lead to update/lock conflicts.
No need here a complex lookup field.
No need to add a persistant Field [CustomerName] on Table1.
As Gordon said, a simple Join is enough :
Select T1.OrderID, T2.ID, T2.Name
From Customers T2
Join Orders T1 On T1.IDOrder = T2.ID
That said, if you want to use lookup Fields (as we do it on a Dataset) with SQL you can use some thing like :
Select T1.OrderID, T2.ID,
( Select T3.YourLookupField From T3 where (T3.ID = T2.ID) )
From Customers T2 Join Orders T1 On T1.IDOrder = T2.ID
Regards.