Index Guidance for SQL Query - sql

Anyone have guidance on how to approach building indexes for the following query? The query works as expected, but I can't seem to get around full table scans. Working with Oracle 11g.
SELECT v.volume_id
FROM ( SELECT MIN (usv.volume_id) volume_id
FROM user_stage_volume usv
WHERE usv.status = 'NEW'
AND NOT EXISTS
(SELECT 1
FROM user_stage_volume kusv
WHERE kusv.deal_num = usv.deal_num
AND kusv.locked = 'Y')
GROUP BY usv.deal_num, usv.volume_type
ORDER BY MAX (usv.priority) DESC, MIN (usv.last_update) ASC) v
WHERE ROWNUM = 1;
Please request any more info you may need in comments and I'll edit.
Here is the create script for the table. The PK is VOLUME_ID. DEAL_NUM is not unique.
CREATE TABLE ENDUR.USER_STAGE_VOLUME
(
DEAL_NUM NUMBER(38) NOT NULL,
EXTERNAL_ID NUMBER(38) NOT NULL,
VOLUME_TYPE NUMBER(38) NOT NULL,
EXTERNAL_TYPE VARCHAR2(100 BYTE) NOT NULL,
GMT_START DATE NOT NULL,
GMT_END DATE NOT NULL,
VALUE FLOAT(126) NOT NULL,
VOLUME_ID NUMBER(38) NOT NULL,
PRIORITY INTEGER NOT NULL,
STATUS VARCHAR2(100 BYTE) NOT NULL,
LAST_UPDATE DATE NOT NULL,
LOCKED CHAR(1 BYTE) NOT NULL,
RETRY_COUNT INTEGER DEFAULT 0 NOT NULL,
INS_DATE DATE NOT NULL
)
ALTER TABLE ENDUR.USER_STAGE_VOLUME ADD (
PRIMARY KEY
(VOLUME_ID))

An index on (deal_num) would help the subquery greatly. In fact, an index on (deal_num, locked) would allow the subquery to avoid the table itself altogether.
You should expect a full table scan on the main query, as it filters on status which is not indexed (and most likely would not benefit from being indexed, unless 'NEW' is a fairly rare value for status).

I think it's running your inner subquery (inside not exists...) once for every run of the outer subquery.
That will be where performance takes a hit - it will run through all of user_stage_volume for each row in user_stage_volume, which is O(n^2), n being the number of rows in usv.
An alternative would be to create a view for the inner subquery, and use that view, or alternatively, to name a temporary view by using WITH.

Related

How to find the columns that need to be indexed?

I'm starting to learn SQL and relational databases. Below is the table that I have, and it has around 10 million records. My composite key is (reltype, from_product_id, to_product_id).
What strategy should I follow while selecting the columns that needs to be indexed? Also, I have documented the operations that would be performed on the table. Please help in determining which columns or combination of columns that need to be indexed?
Table DDL is shown below.
Table name: prod_rel.
Database schema name : public
CREATE TABLE public.prod_rel (
reltype varchar NULL,
assocsequence float4 NULL,
action varchar NULL,
from_product_id varchar NOT NULL,
to_product_id varchar NOT NULL,
status varchar NULL,
starttime varchar NULL,
endtime varchar null,
primary key reltype, from_product_id, to_product_id)
);
Operations performed on table:
select distinct(reltype )
from public.prod_rel;
update public.prod_rel
set status = ? , starttime = ?
where from_product_id = ?;
update public.prod_rel
set status = ? , endtime = ?
where from_product_id = ?;
select *
from public.prod_rel
where from_product_id in (select distinct (from_product_id)
from public.prod_rel
where status = ?
and action in ('A', 'E', 'C', 'P')
and reltype = ?
fetch first 1000 rows only);
Note: I'm not performing any JOIN operations. Also please ignore the uppercase for table or column names. I'm just getting started.
Ideal would be two indexes:
CREATE INDEX ON prod_rel (from_product_id);
CREATE INDEX ON prod_rel (status, reltype)
WHERE action IN ('A', 'E', 'C', 'P');
Your primary key (which also is implemented using an index) cannot support query 2 and 3 because from_product_id is not in the beginning. If you redefine the primary key as from_product_id, to_product_id, reltype, you don't need the first index I suggested.
Why does order matter? Imagine you are looking for a book in a library where the books are ordered by “last name, first name”. You can use this ordering to find all books by “Dickens” quickly, but not all books by any “Charles”.
But let me also comment on your queries.
The first one will perform badly if there are lots of different reltype values; try raising work_mem in that case. It is always a sequential scan of the whole table, and no index can help.
I have changed the order of primary columns as shown below as per #a_horse_with_no_name 's suggestion and created only one index for (from_product_id, reltype, status, action) columns.
CREATE TABLE public.prod_rel (
reltype varchar NULL,
assocsequence float4 NULL,
action varchar NULL,
from_product_id varchar NOT NULL,
to_product_id varchar NOT NULL,
status varchar NULL,
starttime varchar NULL,
endtime varchar null,
primary key reltype, from_product_id, to_product_id)
);
Also, I have gone thorough the portal suggested by #a_horse_with_no_name. It was amazing. I came to know lot of new things on indexing.
https://use-the-index-luke.com/

SQLite speed up select with collate nocase

I have SQLite db:
CREATE TABLE IF NOT EXISTS Commits
(
GlobalVer INTEGER PRIMARY KEY,
Data blob NOT NULL
) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
I want to make 1 select:
SELECT Commits.Data
FROM Streams JOIN Commits ON Streams.GlobalVer=Commits.GlobalVer
WHERE
Streams.Name = ?
ORDER BY Streams.GlobalVer
LIMIT ? OFFSET ?
after that i want to make another select:
SELECT Commits.Data,Streams.Name
FROM Streams JOIN Commits ON Streams.GlobalVer=Commits.GlobalVer
WHERE
Streams.Name = ? COLLATE NOCASE
ORDER BY Streams.GlobalVer
LIMIT ? OFFSET ?
The problem is that 2nd select works super slow. I think this is because COLLATE NOCASE. I want to speed up it. I tried to add index but it doesn't help (may be i did sometinhg wrong?). How to execute 2nd query with speed approximately equals to 1st query's speed?
An index can be used to speed up a search only if it uses the same collation as the query.
By default, an index takes the collation from the table column, so you could change the table definition:
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL COLLATE NOCASE,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
However, this would make the first query slower.
To speed up both queries, you need two indexes, one for each collation. So to use the default collation for the implicit index, and NOCASE for the explicit index:
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL COLLATE NOCASE,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
CREATE INDEX IF NOT EXISTS Streams_nocase_idx ON Streams(Name COLLATE NOCASE, GlobalVar);
(Adding the second column to the index speeds up the ORDER BY in this query.)

Optimizing GROUP BY in hsqldb

I have a table with 700K+ records on wich a simple GROUP BY query takes in excess of 35+ seconds to execute. I'm out of ideas on how to optimize this.
SELECT TOP 10 called_dn, COUNT(called_dn) FROM reportview.calls_out GROUP BY called_dn;
Here I add TOP 10 to limit network transfer induced delays.
I have an index on called_dn (hsqldb seems not to be using this).
called_dn is non nullable.
reportview.calls_out is a cached table.
Here's the table script:
CREATE TABLE calls_out (
pk_global_call_id INTEGER GENERATED BY DEFAULT AS SEQUENCE seq_global_call_id NOT NULL,
sys_global_call_id VARCHAR(65),
call_start TIMESTAMP WITH TIME ZONE NOT NULL,
call_end TIMESTAMP WITH TIME ZONE NOT NULL,
duration_interval INTERVAL HOUR TO SECOND(0),
duration_seconds INTEGER,
call_segments INTEGER,
calling_dn VARCHAR(25) NOT NULL,
called_dn VARCHAR(25) NOT NULL,
called_via_dn VARCHAR(25),
fk_end_status INTEGER NOT NULL,
fk_incoming_queue INTEGER,
call_start_year INTEGER,
call_start_month INTEGER,
call_start_week INTEGER,
call_start_day INTEGER,
call_start_hour INTEGER,
call_start_minute INTEGER,
call_start_second INTEGER,
utc_created TIMESTAMP WITH TIME ZONE,
created_by VARCHAR(25),
utc_modified TIMESTAMP WITH TIME ZONE,
modified_by VARCHAR(25),
PRIMARY KEY (pk_global_call_id),
FOREIGN KEY (fk_incoming_queue)
REFERENCES lookup_incoming_queue(pk_id),
FOREIGN KEY (fk_end_status)
REFERENCES lookup_end_status(pk_id));
I'm I stuck with this kind of performance or is there something I might try to speed up this query?
EDIT: Here's the query plan if it helps:
isDistinctSelect=[false]
isGrouped=[true]
isAggregated=[true]
columns=[ COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN not nullable
COUNT arg=[ COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN nullable]
[range variable 1
join type=INNER
table=CALLS_OUT
cardinality=771855
access=FULL SCAN
join condition = [index=SYS_IDX_SYS_PK_10173_10177]]]
groupColumns=[COLUMN: REPORTVIEW.CALLS_OUT.CALLED_DN]
offset=[VALUE = 0, TYPE = INTEGER]
limit=[VALUE = 10, TYPE = INTEGER]
PARAMETERS=[]
SUBQUERIES[]
Well, as it seems there's no way to avoid a full column scan in this situation.
Just for reference of future souls reaching this question, here's what I resorted to in the end:
Created a summary table maintained by INSERT / DELETE triggers in the original table. This in combination with suitable indexes and using LIMIT USING INDEX clauses in my queries yields very good performance.

Bad SQLite query performance with outer joins

I have an SQLite database as part of an iOS app which works fine for the most part but certain small changes to a query can result in it taking 1000x longer to complete. Here's the 2 tables I have involved:
create table "journey_item" ("id" SERIAL NOT NULL PRIMARY KEY,
"position" INTEGER NOT NULL,
"last_update" BIGINT NOT NULL,
"rank" DOUBLE PRECISION NOT NULL,
"skipped" BOOLEAN NOT NULL,
"item_id" INTEGER NOT NULL,
"journey_id" INTEGER NOT NULL);
create table "content_items" ("id" SERIAL NOT NULL PRIMARY KEY,
"full_id" VARCHAR(32) NOT NULL,
"title" VARCHAR(508),
"timestamp" BIGINT NOT NULL,
"item_size" INTEGER NOT NULL,
"http_link" VARCHAR(254),
"local_url" VARCHAR(254),
"creator_id" INTEGER NOT NULL,
"from_id" INTEGER,"location_id" INTEGER);
Tables have indexes on primary and foreign keys.
And here are 2 queries which give a good example of my problem
SELECT * FROM content_items ci
INNER JOIN journey_item ji ON ji.item_id = ci.id WHERE ji.journey_id = 1
SELECT * FROM content_items ci
LEFT OUTER JOIN journey_item ji ON ji.item_id = ci.id WHERE ji.journey_id = 1
The first query takes 167 ms to complete while the second takes 3.5 minutes and I don't know why the outer join would make such a huge difference.
Edit:
Without the WHERE part the second query only takes 267 ms
The two queries should have the same result set (the where clause turns the left join into an inner join)`. However, SQLite probably doesn't recognize this.
If you have an index on journey_item(journey_id, item_id), then this would be used for the inner join version. However, the second version is probably scanning the first table for the join. An index on journey_item(item_id) would help, but probably still not match the performance of the first query.

SQL JOIN To Find Records That Don't Have a Matching Record With a Specific Value

I'm trying to speed up some code that I wrote years ago for my employer's purchase authorization app. Basically I have a SLOW subquery that I'd like to replace with a JOIN (if it's faster).
When the director logs into the application he sees a list of purchase requests he has yet to authorize or deny. That list is generated with the following query:
SELECT * FROM SA_ORDER WHERE ORDER_ID NOT IN
(SELECT ORDER_ID FROM SA_SIGNATURES WHERE TYPE = 'administrative director');
There are only about 900 records in sa_order and 1800 records in sa_signature and this query still takes about 5 seconds to execute. I've tried using a LEFT JOIN to retrieve records I need, but I've only been able to get sa_order records with NO matching records in sa_signature, and I need sa_order records with "no matching records with a type of 'administrative director'". Your help is greatly appreciated!
The schema for the two tables is as follows:
The tables involved have the following layout:
CREATE TABLE sa_order
(
`order_id` BIGINT PRIMARY KEY AUTO_INCREMENT,
`order_number` BIGINT NOT NULL,
`submit_date` DATE NOT NULL,
`vendor_id` BIGINT NOT NULL,
`DENIED` BOOLEAN NOT NULL DEFAULT FALSE,
`MEMO` MEDIUMTEXT,
`year_id` BIGINT NOT NULL,
`advisor` VARCHAR(255) NOT NULL,
`deleted` BOOLEAN NOT NULL DEFAULT FALSE
);
CREATE TABLE sa_signature
(
`signature_id` BIGINT PRIMARY KEY AUTO_INCREMENT,
`order_id` BIGINT NOT NULL,
`signature` VARCHAR(255) NOT NULL,
`proxy` BOOLEAN NOT NULL DEFAULT FALSE,
`timestamp` TIMESTAMP NOT NULL DEFAULT NOW(),
`username` VARCHAR(255) NOT NULL,
`type` VARCHAR(255) NOT NULL
);
Create an index on sa_signatures (type, order_id).
This is not necessary to convert the query into a LEFT JOIN unless sa_signatures allows nulls in order_id. With the index, the NOT IN will perform as well. However, just in case you're curious:
SELECT o.*
FROM sa_order o
LEFT JOIN
sa_signatures s
ON s.order_id = o.order_id
AND s.type = 'administrative director'
WHERE s.type IS NULL
You should pick a NOT NULL column from sa_signatures for the WHERE clause to perform well.
You could replace the [NOT] IN operator with EXISTS for faster performance.
So you'll have:
SELECT * FROM SA_ORDER WHERE NOT EXISTS
(SELECT ORDER_ID FROM SA_SIGNATURES
WHERE TYPE = 'administrative director'
AND ORDER_ID = SA_ORDER.ORDER_ID);
Reason : "When using “NOT IN”, the query performs nested full table scans, whereas for “NOT EXISTS”, query can use an index within the sub-query."
Source : http://decipherinfosys.wordpress.com/2007/01/21/32/
This following query should work, however I suspect your real issue is you don't have the proper indices in place. You should have an index on the SA_SGINATURES table on the ORDER_ID column.
SELECT *
FROM
SA_ORDER
LEFT JOIN
SA_SIGNATURES
ON
SA_ORDER.ORDER_ID = SA_SIGNATURES.ORDER_ID AND
TYPE = 'administrative director'
WHERE
SA_SIGNATURES.ORDER_ID IS NULL;
select * from sa_order as o inner join sa_signature as s on o.orderid = sa.orderid and sa.type = 'administrative director'
also, you can create a non clustered index on type in sa_signature table
even better - have a master table for types with typeid and typename, and then instead of saving type as text in your sa_signature table, simply save type as integer. thats because computing on integers is way faster than computing on text