Performance issue in postgresql - sql

when I'm trying to use this query in oracle it's taking 0.04054s and while using the same query in PostgreSQL then it taking 49.8min how can I change the query to increase performance in PostgreSQL?
SELECT
"ID","IMAGE","TITLE","SERVICE_DESC"
,"STATUS", "ACTION","REMOVAL_TEXT","SERVICE_PROVIDER"
, "SERVICE_PROVIDER_NAME"
FROM (
SELECT DISTINCT "ID","IMAGE"
,"TITLE", "SERVICE_DESC"
, COALESCE("STATUS",'N') as "STATUS"
,"ACTION","REMOVAL_TEXT","CREATED_DT"
,"SERVICE_PROVIDER", "SERVICE_PROVIDER_NAME"
FROM MZP_ADP.ALL_SERVICE_DETAILS
WHERE "ZIP_CODE"='55005' AND "MAKE_LIVE" = 'Y'
AND "LOCATION_ID" = '2407605'
AND "END_DATE" > CURRENT_TIMESTAMP(0)::TIMESTAMP WITHOUT TIME ZONE
AND "IS_ACTIVE" = 'Y' order by "CREATED_DT" desc
) alias;

There can be a lot of problem. (rowcounts, hardwer, no index)
First, what is the rowcounts of table?
Have you inserted a lot of row some time before?
(Then can be REINDEX TABLE TABLE_NAME , And VACUUM ANALYZE TABLE_NAME help.)
CHECK indexes on this columns
LOCATION_ID
ZIP_CODE
CREATED_DT
END_DATE
Why is the select in subselect?
Please eliminate.
Can you eliminate the distinct with an additional where clause?
Please Share plans and rowcount than can we say more details.
EXPLAIN ANALIZE SELECT..

You can try:
Create this index
create index ALL_SERVICE_DETAILS_CMP_INDEX on MZP_ADP.ALL_SERVICE_DETAILS ("ZIP_CODE", "MAKE_LIVE", "LOCATION_ID", "END_DATE", "IS_ACTIVE");
Remove parent select
Remove distinct (if there is at least one unique column in the select)

Apply few things for performance boosting
VACUUM FULL for tables (it also rebuild indexes). But any confusion execute rebuild indexes
VACUUM (FULL, ANALYZE) table_name;
REINDEX TABLE table_name;
Increase work_mem and maintenance_work_mem as per your memory and server
configuration
Use GROUP BY instead of DISTINCT (distinct is slower)
Remove ORDER BY inside subquery. If needed then use it outside
create an composite index with column ZIP_CODE, LOCATION_ID, END_DATE and use
proper ordering in WHERE clause (As MAKE_LIVE and IS_ACTIVE are flag type so need
to add first in index)
EXPLAIN ANALYZE QUERY for checking execution time and using proper index in query
Pseudocode:
SELECT columns
FROM (SELECT columns
FROM table
WHERE searching columns as per index creation
GROUP BY WITHOUT aggregated COLUMNS) t
ORDER BY columns -- if needed

Related

Selecting the most optimal query

I have table in Oracle database which is called my_table for example. It is type of log table. It has an incremental column which is named "id" and "registration_number" which is unique for registered users. Now I want to get latest changes for registered users so I wrote queries below to accomplish this task:
First version:
SELECT t.*
FROM my_table t
WHERE t.id =
(SELECT MAX(id) FROM my_table t_m WHERE t_m.registration_number = t.registration_number
);
Second version:
SELECT t.*
FROM my_table t
INNER JOIN
( SELECT MAX(id) m_id FROM my_table GROUP BY registration_number
) t_m
ON t.id = t_m.m_id;
My first question is which of above queries is recommended and why? And second one is if sometimes there is about 70.000 insert to this table but mostly the number of inserted rows is changing between 0 and 2000 is it reasonable to add index to this table?
An analytical query might be the fastest way to get the latest change for each registered user:
SELECT registration_number, id
FROM (
SELECT
registration_number,
id,
ROW_NUMBER() OVER (PARTITION BY registration_number ORDER BY id DESC) AS IDRankByUser
FROM my_table
)
WHERE IDRankByUser = 1
As for indexes, I'm assuming you already have an index by registration_number. An additional index on id will help the query, but maybe not by much and maybe not enough to justify the index. I say that because if you're inserting 70K rows at one time the additional index will slow down the INSERT. You'll have to experiment (and check the execution plans) to figure out if the index is worth it.
In order to check for faster query, you should check the execution plan and cost and it will give you a fair idea. But i agree with solution of Ed Gibbs as analytics make query run much faster.
If you feel this table is going to grow very big then i would suggest partitioning the table and using local indexes. They will definitely help you to form faster queries.
In cases where you want to insert lots of rows then indexes slow down insertion as with each insertion index also has to be updated[I will not recommend index on ID]. There are 2 solutions i have think of for this:
You can drop index before insertion and then recreate it after insertion.
Use reverse key indexes. Check this link : http://oracletoday.blogspot.in/2006/09/there-is-option-to-create-index.html. Reverse key index can impact your query a bit so there will be trade off.
If you look for faster solution and there is a really need to maintain list of last activity for each user, then most robust solution is to maintain separate table with unique registration_number values and rowid of last record created in log table.
E.g. (only for demo, not checked for syntax validity, sequences and triggers omitted):
create table my_log(id number not null, registration_number number, action_id varchar2(100))
/
create table last_user_action(refgistration_number number not null, last_action rowid)
/
alter table last_user_action
add constraint pk_last_user_action primary key (registration_number) using index
/
create or replace procedure write_log(p_reg_num number, p_action_id varchar2)
is
v_row_id rowid;
begin
insert into my_log(registration_number, action_id)
values(p_reg_num, p_action_id)
returning rowid into v_row_id;
update last_user_action
set last_action = v_row_id
where registration_number = p_reg_num;
end;
/
With such schema you can simple query last actions for every user with good performance:
select
from
last_user_action lua,
my_log l
where
l.rowid (+) = lua.last_action
Rowid is physical storage identity directly addressing storage block and you can't use it after moving to another server, restoring from backups etc. But if you need such functionality it's simple to add id column from my_log table to last_user_action too, and use one or another depending on requirements.

How to create database INDEX for SQL expression?

I am beginner with indexes. I want to create index for this SQL expression which takes too much time to execute so I would like on what exact columns should I create index?
I am using DB2 db but never mind I think that question is very general.
My SQL expression is:
select * from incident where (relatedtoglobal=1)
and globalticketid in (select ticketid from INCIDENT where status='RESOLVED')
and statusdate <='2012-10-09 12:12:12'
Should I create index with this 5 columns or how?
Thanks
Your query:
select *
from incident
where relatedtoglobal = 1
and globalticketid in ( select ticketid
from INCIDENT
where status='RESOLVED'
)
and statusdate <='2012-10-09 12:12:12' ;
And the subquery inside:
select ticketid
from INCIDENT
where status='RESOLVED'
An index on (status, ticketid) will certainly help efficiency of the subquery evaluation and thus of the query.
For the query, besides the previous index, you'll need one more index. The (relatedtoglobal, globalticketid) may be sufficient.
I'm not sure if a more complex indexing would/could be used by the DB2 engine.
Like one on (relatedtoglobal, globalticketid) INCLUDE (statusdate) or
Two indices, one on (relatedtoglobal, globalticketid) and one on (relatedtoglobal, statusdate)
The DB2 documentation is not an easy read but has many details. Start with CREATE INDEX statement and Implementing Indexes.

How to create index for my table?

I have a table with format below. And I also know the most common used sql on it, so my question is how to create index on my table thus this sql query can have best performance. Btw, my db is sybase ASE 12.5.
Table t:
bu, name, date, score_a, score_b
SQL:
SELECT bu, name, max(score_a), max(score_b)
FROM
t
WHERE date > '20110101' AND date < '20110901'
GROUP BY bu, name
Thanks for any suggestions.
Basically you need to add indexes to fields used by WHERE and GROUP BY clause, so I'd go with code, bu and name. How to create an index:
CREATE INDEX index_name ON table_name (column_name);
In your case:
CREATE INDEX idate ON t (date);
The index on Date suggested by Matino will make sure Sybase only hit rows contributing to the result.
As all fields from each row is used in the query, any other indexes won't help.
The only way to speed up the query some more would be to include all columns in the date index. But that would normally be overkill!

Efficient querying of multi-partition Postgres table

I've just restructured my database to use partitioning in Postgres 8.2. Now I have a problem with query performance:
SELECT *
FROM my_table
WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
ORDER BY id DESC
LIMIT 100;
There are 45 million rows in the table. Prior to partitioning, this would use a reverse index scan and stop as soon as it hit the limit.
After partitioning (on time_stamp ranges), Postgres does a full index scan of the master table and the relevant partition and merges the results, sorts them, then applies the limit. This takes way too long.
I can fix it with:
SELECT * FROM (
SELECT *
FROM my_table_part_a
WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
ORDER BY id DESC
LIMIT 100) t
UNION ALL
SELECT * FROM (
SELECT *
FROM my_table_part_b
WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
ORDER BY id DESC
LIMIT 100) t
UNION ALL
... and so on ...
ORDER BY id DESC
LIMIT 100
This runs quickly. The partitions where the times-stamps are out-of-range aren't even included in the query plan.
My question is: Is there some hint or syntax I can use in Postgres 8.2 to prevent the query-planner from scanning the full table but still using simple syntax that only refers to the master table?
Basically, can I avoid the pain of dynamically building the big UNION query over each partition that happens to be currently defined?
EDIT: I have constraint_exclusion enabled (thanks #Vinko Vrsalovic)
Have you tried Constraint Exclusion (section 5.9.4 in the document you've linked to)
Constraint exclusion is a query
optimization technique that improves
performance for partitioned tables
defined in the fashion described
above. As an example:
SET constraint_exclusion = on;
SELECT count(*) FROM measurement WHERE logdate >= DATE '2006-01-01';
Without
constraint exclusion, the above query
would scan each of the partitions of
the measurement table. With constraint
exclusion enabled, the planner will
examine the constraints of each
partition and try to prove that the
partition need not be scanned because
it could not contain any rows meeting
the query's WHERE clause. When the
planner can prove this, it excludes
the partition from the query plan.
You can use the EXPLAIN command to
show the difference between a plan
with constraint_exclusion on and a
plan with it off.
I had a similar problem that I was able fix by casting conditions in WHERE.
EG: (assuming the time_stamp column is timestamptz type)
WHERE time_stamp >= '2010-02-10'::timestamptz and time_stamp < '2010-02-11'::timestamptz
Also, make sure the CHECK condition on the table is defined the same way...
EG:
CHECK (time_stamp < '2010-02-10'::timestamptz)
I had the same problem and it boiled down to two reasons in my case:
I had indexed column of type timestamp WITH time zone and partition constraint by this column with type timestamp WITHOUT time zone.
After fixing constraints ANALYZE of all child tables was needed.
Edit: another bit of knowledge - it's important to remember that constraint exclusion (which allows PG to skip scanning some tables based on your partitioning criteria) doesn't work with, quote: non-immutable function such as CURRENT_TIMESTAMP
I had requests with CURRENT_DATE and it was part of my problem.

Creating Indexes for Group By Fields?

Do you need to create an index for fields of group by fields in an Oracle database?
For example:
select *
from some_table
where field_one is not null and field_two = ?
group by field_three, field_four, field_five
I was testing the indexes I created for the above and the only relevant index for this query is an index created for field_two. Other single-field or composite indexes created on any of the other fields will not be used for the above query. Does this sound correct?
It could be correct, but that would depend on how much data you have. Typically I would create an index for the columns I was using in a GROUP BY, but in your case the optimizer may have decided that after using the field_two index that there wouldn't be enough data returned to justify using the other index for the GROUP BY.
No, this can be incorrect.
If you have a large table, Oracle can prefer deriving the fields from the indexes rather than from the table, even there is no single index that covers all values.
In the latest article in my blog:
NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: Oracle
, there is a query in which Oracle does not use full table scan but rather joins two indexes to get the column values:
SELECT l.id, l.value
FROM t_left l
WHERE NOT EXISTS
(
SELECT value
FROM t_right r
WHERE r.value = l.value
)
The plan is:
SELECT STATEMENT
HASH JOIN ANTI
VIEW , 20090917_anti.index$_join$_001
HASH JOIN
INDEX FAST FULL SCAN, 20090917_anti.PK_LEFT_ID
INDEX FAST FULL SCAN, 20090917_anti.IX_LEFT_VALUE
INDEX FAST FULL SCAN, 20090917_anti.IX_RIGHT_VALUE
As you can see, there is no TABLE SCAN on t_left here.
Instead, Oracle takes the indexes on id and value, joins them on rowid and gets the (id, value) pairs from the join result.
Now, to your query:
SELECT *
FROM some_table
WHERE field_one is not null and field_two = ?
GROUP BY
field_three, field_four, field_five
First, it will not compile, since you are selecting * from a table with a GROUP BY clause.
You need to replace * with expressions based on the grouping columns and aggregates of the non-grouping columns.
You will most probably benefit from the following index:
CREATE INDEX ix_sometable_23451 ON some_table (field_two, field_three, field_four, field_five, field_one)
, since it will contain everything for both filtering on field_two, sorting on field_three, field_four, field_five (useful for GROUP BY) and making sure that field_one is NOT NULL.
Do you need to create an index for fields of group by fields in an Oracle database?
No. You don't need to, in the sense that a query will run irrespective of whether any indexes exist or not. Indexes are provided to improve query performance.
It can, however, help; but I'd hesitate to add an index just to help one query, without thinking about the possible impact of the new index on the database.
...the only relevant index for this query is an index created for field_two. Other single-field or composite indexes created on any of the other fields will not be used for the above query. Does this sound correct?
Not always. Often a GROUP BY will require Oracle to perform a sort (but not always); and you can eliminate the sort operation by providing a suitable index on the column(s) to be sorted.
Whether you actually need to worry about the GROUP BY performance, however, is an important question for you to think about.