SQL Teradata Delete performance - sql

I have a simple but large table in TD:
CREATE SET TABLE TABLE1 ,FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO,
MAP = TD_MAP1
(
party INTEGER,
cd SMALLINT)
PRIMARY INDEX ( party );
I need to perform a delete:
DELETE FROM TABLE1 WHERE cd<0 AND cd<>-212;
Tried adding NUSI to cd:
CREATE INDEX (cd) ON TABLE1 ;
Bud that did not help.
Any advice how to increase performance?
Thanks, R.

Related

IBM Informix: getting 245, 144 error doing a select while another transaction has done an insert - possible bug?

Encountered this problem in production in the form of a deadlock. Figured out that if a transaction was inserting a row on my table, and I wanted to select a totally different row from that table, I would get the following error:
245: Could not position within a file via an index.
144: ISAM error: key value locked
Error in line 1
Near character position 70
My select statement was of the form select * from table where bar = 3 and foo = "CCCC";, where "foo" is a foreign key to a table with 18 rows, and "bar" is the first table's primary key. My insert statement was also inserting a row with foo = "CCCC". Curiously, the select query also returned the desired row before outputting the error.
I tried all this on informix 12.10 with isolation level set to repeatable read. I tried it on production, and in a fresh DB I set up with only the two tables mentioned. The lock mode of both tables is "row".
I investigated by modifying the select statement: select * from table where bar = 3; would not fail. Also, select * from table where bar = 3 and foo = "CCCC" order by ber; would not fail (ber being a random field from the table, ber is not indexed).
I would expect all the select statements I tried to return the desired row without error, OR all of them to fail. My solution in production was to order by a random field in the table, which fixed the deadlock issue
Does anyone know why this issue could have happened ? I suspect it is linked to the indexes on the table, which were all created automatically when adding the primary and foreign keys to the table. But I do not know enough about indexes to understand what happened. Could this be a bug ?
Schema of the tables:
create table options (
foo char(4) not null,
fee int not null)
extent size 16 next size 16
lock mode row;
alter table options add constraint (
primary key (foo)
constraint cons1 );
create table decisions (
bar char(3) not null,
foo char(4) not null,
ber int not null)
extent size 131072 next size 65536
lock mode row;
alter table decisions add constraint (
primary key (bar)
constraint cons2 );
alter table decisions add constraint (
foreign key (foo) references options(foo)
constraint cons3 );
Data I inserted into the "options" table:
AAAA|0|
BBBB|0|
CCCC|1|
DDDD|4|
EEEE|1|
FFFF|8|
Data I inserted into the "decisions" table:
QWE|AAAA|0|
WER|AAAA|9|
ERT|CCCC|2|
RTY|AAAA|32|
TYU|CCCC|1234|
YUI|CCCC|42398|
UIO|AAAA|23178|
IOP|CCCC|1233|
OPA|CCCC|11|
PAS|AAAA|890|
ASD|AAAA|90|
SDF|CCCC|2|
DFG|AAAA|4|
FGH|CCCC|7|
Edit: I used set explain on; for the queries.
select * from decisions where foo = "CCCC" and bar = "QWE" order by foo; returned that the index used was on foo="CCCC". However, for select * from decisions where foo = "CCCC" and bar = "QWE" order by ber;, it's indexed on bar="QWE".

Delete a record based on multiple table choices SQL

I'm trying to wrap my head around how to accomplish this Delete query. The goal is I'm trying to delete a client record (main table) based on if they don't have an insurance policy (another table) and if their needs description is "transportation" and importance values is LESS than 5. The needs is another table. They are all connected with foreign keys and SSN as the connector and Delete cascade is working properly. The query is partially working as is. If there is no insurance policy, the Client is being deleted correctly. However, the need description and importance value factors are not currently working. It will still delete if I have no insurance policy, but my importance description is another value other than transportation.
It's almost like I need 2 subqueries compare both Needs table and Insurance_Policy table for deletion, but I don't know how to do that.
The database I'm using is Azure Data Studio
Here is my current Procedure code:
DROP PROCEDURE IF EXISTS Option17;
GO
CREATE PROCEDURE Option17
AS
BEGIN
DELETE FROM Client
WHERE Client.SSN NOT IN (SELECT I.SSN
FROM Insurance_Policy I, Needs N
WHERE Client.SSN = I.SSN
AND Client.SSN = N.SSN
AND N.need_description = 'transportation'
AND N.importance_value < 5)
END
Also, here are my table structures:
CREATE TABLE Client
(
SSN VARCHAR(9),
doctor_name VARCHAR(60),
doctor_phone_no VARCHAR(10),
lawyer_name VARCHAR(60),
lawyer_phone_no VARCHAR(10),
date_assigned DATE,
PRIMARY KEY (SSN),
FOREIGN KEY (SSN) REFERENCES Person
ON DELETE CASCADE
);
CREATE TABLE Insurance_Policy
(
policy_id VARCHAR(10),
provider_id VARCHAR(10),
provider_address VARCHAR(100),
insurance_type VARCHAR(10),
SSN VARCHAR(9),
PRIMARY KEY (policy_id),
FOREIGN KEY (SSN) REFERENCES Client,
);
CREATE TABLE Needs
(
SSN VARCHAR(9),
need_description VARCHAR(60),
importance_value INT CHECK(importance_value > 0 and importance_value <11),
PRIMARY KEY(SSN,need_description),
FOREIGN KEY(SSN) REFERENCES Client
ON DELETE CASCADE
);
Here is a screenshot if the formatting didn't hold up on procedure.
enter image description here
Based on your answers, I believe this is the code you are looking for. If this is not working, let me know.
To explain a little, using an INNER join will eliminate the need for a couple of those WHERE conditions. INNER JOIN only returns records where it exists in both tables. Also there is no need to link to the Client table from within the subquery.
Also you want where it does not have a description of transportation with an importance of less than 5. Since you are pulling a list to leave alone, you do not want to include these records.
DROP PROC IF EXISTS Option17;
GO
Create proc Option17
AS
BEGIN
DELETE FROM Client
WHERE SSN NOT IN (
SELECT
N.SSN
FROM Needs N
INNER JOIN Insurance_Policy I ON N.SSN = I.SSN
WHERE NOT (N.need_description = 'transportation' AND N.importance_value < 5)
);
END
GO
I think you want separate conditions on Needs and Insurance_Policy. And I recommend NOT EXISTS, because it better handles NULL values:
DELETE c
FROM Client c
WHERE NOT EXISTS (SELECT 1
FROM Insurance i
WHERE c.SSN = i.SSN
) AND
EXISTS (SELECT 1
FROM Needs n
WHERE c.SSN = n.SSN AND
n.need_description = 'transportation' AND
n.importance_value < 5
);

Optimizing SQL query on table of 10 million rows: neverending query

I have two tables:
CREATE TABLE routing
(
id integer NOT NULL,
link_geom geometry,
source integer,
target integer,
traveltime_min double precision,
CONSTRAINT routing_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE INDEX routing_id_idx
ON routing
USING btree
(id);
CREATE INDEX routing_link_geom_gidx
ON routing
USING gist
(link_geom);
CREATE INDEX routing_source_idx
ON routing
USING btree
(source);
CREATE INDEX routing_target_idx
ON routing
USING btree
(target);
and
CREATE TABLE test
(
link_id character varying,
link_geom geometry,
id integer NOT NULL,
.. (some more attributes here)
traveltime_min double precision,
CONSTRAINT id PRIMARY KEY (id),
CONSTRAINT test_link_id_key UNIQUE (link_id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE test
OWNER TO postgres;
and I am trying to appy the follwing query:
update routing
set traveltime_min = t2.traveltime_min
from test t2
where t2.id = routing.id
Both tables have near 10 millions rows. The problem is that this query runs neverending. Here what 'EXPLAIN' shows:
Update on routing (cost=601725.94..1804772.15 rows=9712264 width=208)
-> Hash Join (cost=601725.94..1804772.15 rows=9712264 width=208)
Hash Cond: (routing.id = t2.id)"
-> Seq Scan on routing (cost=0.00..366200.23 rows=9798223 width=194)"
-> Hash (cost=423414.64..423414.64 rows=9712264 width=18)"
-> Seq Scan on test t2 (cost=0.00..423414.64 rows=9712264 width=18)"
I cannot understand what might cause the problem of such a slow response.
Is it possible to be a problem caused from the server settings? The thing is that i use the default postgrSQL 9.3 settings.
Drop all indexes on routing before you run the UPDATE and add them again afterwards. That will bring a huge improvement.
Set work_mem high in the session where you run the UPDATE. That will help with the hash.
Set shared_buffers to ΒΌ of the available memory, but not more than 1GB.
If not all the rows are actually changed by the UPDATE (if the get the same value as they had) , you should avoid these idempotent updates.
if you expect the query to affect every row, the query plan is not important. [except, maybe, for the case of overflowing hash tables ...]
-- these could be needed if the update would be more selective...
VACUUM analyze routing;
VACUUM analyze test;
UPDATE routing dst
SET traveltime_min = src.traveltime_min
FROM test src
WHERE dst.id = src.id
-- avoid useless updates and row-versions
AND dst.traveltime_min IS DISTINCT FROM src.traveltime_min
;
-- VACUUM analyze routing;

SQLite performance tuning for paginated fetches

I am trying to optimize the query I use for fetching paginated data from database with large data sets.
My schema looks like this:
CREATE TABLE users (
user_id TEXT PRIMARY KEY,
name TEXT,
custom_fields TEXT
);
CREATE TABLE events (
event_id TEXT PRIMARY KEY,
organizer_id TEXT NOT NULL REFERENCES users(user_id) ON DELETE SET NULL ON UPDATE CASCADE,
name TEXT NOT NULL,
type TEXT NOT NULL,
start_time INTEGER,
duration INTEGER
-- more columns here, omitted for the sake of simplicity
);
CREATE INDEX events_organizer_id_start_time_idx ON events(organizer_id, start_time);
CREATE INDEX events_organizer_id_type_idx ON events(organizer_id, type);
CREATE INDEX events_organizer_id_type_start_time_idx ON events(organizer_id, type, start_time);
CREATE INDEX events_type_start_time_idx ON events(type, start_time);
CREATE INDEX events_start_time_desc_idx ON events(start_time DESC);
CREATE INDEX events_start_time_asc_idx ON events(IFNULL(start_time, 253402300800) ASC);
CREATE TABLE event_participants (
participant_id TEXT NOT NULL REFERENCES users(user_id) ON DELETE CASCADE ON UPDATE CASCADE,
event_id TEXT NOT NULL REFERENCES events(event_id) ON DELETE CASCADE ON UPDATE CASCADE,
role INTEGER NOT NULL DEFAULT 0,
UNIQUE (participant_id, event_id) ON CONFLICT REPLACE
);
CREATE INDEX event_participants_participant_id_event_id_idx ON event_participants(participant_id, event_id);
CREATE INDEX event_participants_event_id_idx ON event_participants(event_id);
CREATE TABLE event_tag_maps (
event_id TEXT NOT NULL REFERENCES events(event_id) ON DELETE CASCADE ON UPDATE CASCADE,
tag_id TEXT NOT NULL,
PRIMARY KEY (event_id, tag_id) ON CONFLICT IGNORE
);
CREATE INDEX event_tag_maps_event_id_tag_id_idx ON event_tag_maps(event_id, tag_id);
Where in events table I have around 1,500,000 entries, and around 2,000,000 in event_participants.
Now, a typical query would look something like:
SELECT
EVTS.event_id,
EVTS.type,
EVTS.name,
EVTS.time,
EVTS.duration
FROM events AS EVTS
WHERE
EVTS.organizer_id IN(
'f39c3bb1-3ee3-11e6-a0dc-005056c00008',
'4555e70f-3f1d-11e6-a0dc-005056c00008',
'6e7e33ae-3f1c-11e6-a0dc-005056c00008',
'4850a6a0-3ee4-11e6-a0dc-005056c00008',
'e06f784c-3eea-11e6-a0dc-005056c00008',
'bc6a0f73-3f1d-11e6-a0dc-005056c00008',
'68959fb5-3ef3-11e6-a0dc-005056c00008',
'c4c96cf2-3f1a-11e6-a0dc-005056c00008',
'727e49d1-3f1b-11e6-a0dc-005056c00008',
'930bcfb6-3f09-11e6-a0dc-005056c00008')
AND EVTS.type IN('Meeting', 'Conversation')
AND(
EXISTS (
SELECT 1 FROM event_tag_maps AS ETM WHERE ETM.event_id = EVTS.event_id AND
ETM.tag_id IN ('00000000-0000-0000-0000-000000000000', '6ae6870f-1aac-11e6-aeb9-005056c00008', '6ae6870c-1aac-11e6-aeb9-005056c00008', '1f6d3ccb-eaed-4068-a46b-ec2547fec1ff'))
OR NOT EXISTS (
SELECT 1 FROM event_tag_maps AS ETM WHERE ETM.event_id = EVTS.event_id)
)
AND EXISTS (
SELECT 1 FROM event_participants AS EPRTS
WHERE
EVTS.event_id = EPRTS.event_id
AND participant_id NOT IN('79869516-3ef2-11e6-a0dc-005056c00008', '79869515-3ef2-11e6-a0dc-005056c00008', '79869516-4e18-11e6-a0dc-005056c00008')
)
ORDER BY IFNULL(EVTS.start_time, 253402300800) ASC
LIMIT 100 OFFSET #Offset;
Also, for fetching the overall count of the query-matching items, I would use the above query with count(1) instead of the columns and without the ORDER BY and LIMIT/OFFSET clauses.
I experience two main problems here:
1) The performance drastically decreases as I increase the #Offset value. The difference is very significant - from being almost immediate to a number of seconds.
2) The count query takes a long time (number of seconds) and produces the following execution plan:
0|0|0|SCAN TABLE events AS EVTS
0|0|0|EXECUTE LIST SUBQUERY 1
0|0|0|EXECUTE LIST SUBQUERY 1
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE event_tag_maps AS ETM USING COVERING INDEX event_tag_maps_event_id_tag_id_idx (event_id=? AND tag_id=?)
1|0|0|EXECUTE LIST SUBQUERY 2
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 2
2|0|0|SEARCH TABLE event_tag_maps AS ETM USING COVERING INDEX event_tag_maps_event_id_tag_id_idx (event_id=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 3
3|0|0|SEARCH TABLE event_participants AS EPRTS USING INDEX event_participants_event_id_idx (event_id=?)
Here I don't understand why the full scan is performed instead of an index scan.
Additional info and SQLite settings used:
I use System.Data.SQLite provider (have to, because of custom functions support)
Page size = cluster size (4096 in my case)
Cache size = 100000
Journal mode = WAL
Temp store = 2 (memory)
No transaction is open for the query
Is there anything I could do to change the query/schema or settings in order to get as much performance improvement as possible?

Restoring a Truncated Table from a Backup

I am restoring the data of a truncated table in an Oracle Database from an exported csv file. However, I find that the primary key auto-increments and does not insert the actual values of the primary key constrained column from the backed up file.
I intend to do the following:
1. drop the primary key
2. import the table data
3. add primary key constraints on the required column
Is this a good approach? If not, what is recommended? Thanks.
EDIT: After more investigation, I observed there's a trigger to generate nextval on a sequence to be inserted into the primary key column. This is the source of the predicament. Hence, following the procedure above would not solve the problem. It lies in the trigger (and/or sequence) on the table. This is solved!
easier to use your .csv as an external table and then go
create table your_table_temp as select * from external table
examine the data in the new temp table to ensure you know what range of primary keys is present
do a merge into the new table
samples from here and here
CREATE TABLE countries_ext (
country_code VARCHAR2(5),
country_name VARCHAR2(50),
country_language VARCHAR2(50)
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY ext_tab_data
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY ','
MISSING FIELD VALUES ARE NULL
(
country_code CHAR(5),
country_name CHAR(50),
country_language CHAR(50)
)
)
LOCATION ('Countries1.txt','Countries2.txt')
)
PARALLEL 5
REJECT LIMIT UNLIMITED;
and the merge
MERGE INTO employees e
USING hr_records h
ON (e.id = h.emp_id)
WHEN MATCHED THEN
UPDATE SET e.address = h.address
WHEN NOT MATCHED THEN
INSERT (id, address)
VALUES (h.emp_id, h.address);
Edit: after you have merged the data you can drop the temp table and the result is your previous table with the old data and the new data together
Edit you mention " During imports, the primary key column does not insert from the file, but auto-increments". This can only happen when there is a trigger on the table, likely, Before insert on each row. Disable the trigger and then do your import. Re-enable the trigger after committing your inserts.
I used the following procedure to solve it:
drop trigger trigger_name
Imported the table data into target table
drop sequence sequence_name
CREATE SEQUENCE SEQ_NAME INCREMENT BY 1 START WITH start_index_for_next_val MAXVALUE max_val MINVALUE 1 NOCYCLECACHE 20 NOORDER
CREATE OR REPLACE TRIGGER "schema_name"."trigger_name"
before insert on target_table
for each row
begin
select seq_name.nextval
into :new.unique_column_name
from dual;
end;