Can a clustered index help in this scenario? - indexing

We have 2 tables dispense and dispensedetail. On Dispense there is a foreign key PatientID.
Most processes and queries are in the context of a patient.
Currently the dispense table has a non-clustered index (DispenseID, PatientID).
The DispenseDetail has DispenseDetailID as primary key and a non-clustered index (DispenseID).
We are noticing some slowness that is caused by pageIo latches (sh) as sql server has to bring data from disk into memory.
I am thinking about a clustered index (DispenseID,DispenseDetailID) which can help retrieve the dispense details of a particular patient but it may worsen the insert dispense.. Dispense inserts are more important as without them there won't be data to query.
Will a non-clustered index (DispenseID,DispenseDetailID) help any?
Any comments or thoughts will be much appreciated.
Thanks!
Information for sqlonly
There are 4 physical CPUs each with 6 cores totalling 24 cores. There are 32 virtual CPUs.
The database is on a VM where there are 4 physical CPUs each with 6 cores totally 24 cores.and there are 32 virtual CPUs.
The dispense table has 4000000+ rows. The dispense detail has 11000000+ rows.
I don't know how to compute or get the average page io latch wait time. Querying sys.dm_os_latch_stats and ordering by wait time, this is the result set:
latch_class waiting_requests_count wait_time_ms max_wait_time_ms
BUFFER 62658377 97584783 12051
ACCESS_METHODS_DATASET_PARENT 950195 7870081 19652
ACCESS_METHODS_HOBT_VIRTUAL_ROOT 799403 5071290 5692
BACKUP_OPERATION 785245 372930 206
LOG_MANAGER 7 40403 11235
ACCESS_METHODS_HOBT_COUNT 7959 19728 1587
NESTING_TRANSACTION_FULL 122342 7969 59
ACCESS_METHODS_ACCESSOR_CACHE 67877 5143 65
ACCESS_METHODS_BULK_ALLOC 1644 734 49
ACCESS_METHODS_HOBT 15 76 15
SPACEMGR_ALLOCEXTENT_CACHE 169 71 10
SPACEMGR_IAM_PAGE_RANGE_CACHE 68 49 4
NESTING_TRANSACTION_READONLY 1942 11 1
SERVICE_BROKER_WAITFOR_MANAGER 31 9 4
TRACE_CONTROLLER 1 1 1
APPEND_ONLY_STORAGE_FIRST_ALLOC 11 1 1
In dev I used the current indexes to get just dispenseID and dispenseDetailID for the patient -- the outcome it is an index seek. However that result set must be inserted into a temp table to get other fields and the insert to temp table is costly so the net is no improvement gained.
Thanks!

Related

Tuning Performance of Large Oracle SQL Query

Introduction
Hi Everyone, I am a bit of an Oracle SQL novice coming mostly from Python. I have a large procedure that I will outline below with an example. This procedure is taking upwards of 5 minutes to process 500 records and essentially hangs after 750 records. So the runtime is essentially exponentially increasing.
SQL
The general overview of the procedure is made up of two select blocks selecting data from two different sources. These blocks are wrapped inside of a larger select statement that filters and matches records and selects the remainder:
For example:
SELECT DISINCT
*matched sales*
FROM
(SELECT
*direct sales info from db1*
FROM
DB1
WHERE
sales_code = 'DIRECT') a
db2.prod,
db2.cont,
db2.cust, --etc
(SELECT *qualified customer information
FROM *a few DB2 tables*
WHERE code = 'DIR') qual
--A few more of the above inline views to get eligible cust and price
WHERE
*DB2 product numbers, customer numbers and contract numbers are matched to eachother & above
views* This is where the most time is being taken up.
--ex
cust.cont_num = cont.cont_num
*DB1 records matched to DB2 records*
--ex
a.cont_num = cont.cont_num
Question
Ok so my issue here is the performance of the DB2 block essentially, selecting all of the different tables necessary, creating the inline views and matching all of them together. This is taking upwards of 10 minutes by itself.
As a novice, how can I tune this? Would using a temp table to store this block work so it doesn't have to do it over and over? Or should I use more inline views? Nest another select block like the first one?
Explain Plan
OPERATION OBJECT_NAME OPTIONS CARDINALITY COST
SELECT STATEMENT
639039097 31298
HASH JOIN
639039097 31298
INDEX
CARSNG.IE_PRODID_IDX_4 FAST FULL SCAN 9184 13
HASH JOIN
639039097 29585
TABLE ACCESS
CARSNG.UOM FULL 6 3
HASH JOIN
639039097 27881
VIEW
CARSNG.index$_join$_011 8236 77
HASH JOIN
HASH JOIN
INDEX
CARSNG.FK_PROD_IDX_4 FAST FULL SCAN 8236 20
INDEX
CARSNG.IE_PROD_IDX_1 FAST FULL SCAN 8236 33
INDEX
CARSNG.PK_PROD FAST FULL SCAN 8236 24
HASH JOIN
639094333 26104
INDEX
CARSNG.IE_CPPT_IDX_3 FAST FULL SCAN 1254629 2473
NESTED LOOPS
634106 17709
HASH JOIN
2580 2212
VIEW
CARSNG.index$_join$_014 24 2
HASH JOIN
INDEX
CARSNG.AK_WHOAMI_IDX_1 FAST FULL SCAN 24 1
INDEX
CARSNG.PK_WHOAMI FAST FULL SCAN 24 1
HASH JOIN
2580 2210
HASH JOIN
2589 2161
VIEW
2589 1690
HASH
GROUP BY 2589 1690
NESTED LOOPS
2589 1689
NESTED LOOPS
5874 1689
VIEW
SYS.VW_GBF_18 89 626
HASH
GROUP BY 89 626
HASH JOIN
SEMI 1963 625
TABLE ACCESS
CARSNG.CPGRP FULL 1970 591
VIEW
CARSNG.index$_join$_003 6415 34
HASH JOIN
INDEX
CARSNG.FK_CONT_IDX_3 FAST FULL SCAN 6415 18
INDEX
CARSNG.AK_CONT_IDX_1 FAST FULL SCAN 6415 25
INDEX
CARSNG.IE_CPPT_IDX_2 RANGE SCAN 66 2
TABLE ACCESS
CARSNG.CPPT BY INDEX ROWID 29 12
VIEW
CARSNG.index$_join$_013 43365 471
HASH JOIN
HASH JOIN
INDEX
CARSNG.PK_CPGRP FAST FULL SCAN 43365 114
INDEX
CARSNG.AK_CPGRP_IDX_4 FAST FULL SCAN 43365 192
INDEX
CARSNG.IE_CPGRP_IDX_3 FAST FULL SCAN 43365 168
VIEW
CARSNG.index$_join$_012 6415 49
HASH JOIN
INDEX
CARSNG.FK_CONT_IDX_3 FAST FULL SCAN 6415 18
INDEX
CARSNG.AK_CONT_IDX_3 FAST FULL SCAN 6415 44
INDEX
CARSNG.IE_ELIG_IDX_1 RANGE SCAN 246 6
I actually just figured this out but maybe my answer can help someone debugging in the future. I converted the DB2 selections into an inline view and that helped some. But where I was going wrong was through SELECT DISTINCT rather than SELECT, changing this saved an unreal amount of time. There was also one table that was being referenced but not matched, so I removed that.
Use a select ... from ... group by <column_names>
instead of using distinct for selecting columns use group by for them...

Fragmentation of one specific index is increasing too often

I have a large table and it has more than 10 indexes. I have a fragmentation problem on one specific index. In the day hours, thousands of rows are being inserted in this table and the fragmentation of just one specific index is increasing very frequently. Other indexes are OK (maybe 0.01% per hour), but this specific index is increasing like 3 - 4% per hour ! It will probably be like 50 - 60% at the end of the day.
Can you help me to find out why this index is increasing too often.
----- Fill factor
This specific index: 0%
Other index (that has no problem about increasing): 90%
----- Index details;
non-clustured
2 index key columns: (bit and nvarchar(100) type columns)
1 included column: (int) FK_OrderID (foreign key for another table)
number of rows in the table : 6.5 million
size of the table: 6.2 gb
and SHOWCONTIG details for the table;
Pages Scanned................................: 805566
Extents Scanned..............................: 100877
Extent Switches..............................: 108951
Avg. Pages per Extent........................: 8.0
Scan Density [Best Count:Actual Count].......: 92.42% [100696:108952]
Logical Scan Fragmentation ..................: 1.43%
Extent Scan Fragmentation ...................: 19.82%
Avg. Bytes Free per Page.....................: 983.4
Avg. Page Density (full).....................: 87.85%
Thanks!
I have resolved this issue by setting fillfactor value = 80. Thanks for replies

Database design for a step by step wizard

I am designing a system containing logical steps with some actions associated (but the actions are not part of the question, but they are crucial for each step in the list)!
The ting is that I need to create a way to define all the logical steps in an ordered way, so that I can get the list by query, and also make modifications later on!
Anyone with some experience in this kind of database design?
I have been thinking of having a column named wizard_steps (or something similar), and then use priority to make the order, but for some reason i feel that this design at some point will fail (due to items with same priority, adding new items would then have to rearrange the rest of the items, and so forth)!
Another design I have been thinking about is the use of "next item" as a column in the wizard_step column, but I don't feel this is the correct step eighter!
So to summarize; I am trying to make a list (and the design should be open enought to support multiple lists) of elements where the order is crucial!
Any ideas on how the database should look like?
Thanks!
EDIT: I found this yii component I will check out: http://www.yiiframework.com/extension/simpleworkflow/
Might be a good solution!
If I get you well, your main concern is to create a schema that supports ordered lists and can provide easy insert/reordering of items.
The following table design:
id_list item_priority foreign_itemdef_id
1 1 245
1 2 32
1 3 45
2 1 156
2 2 248
2 3 127
coupled to a table with item definition will be easily queried but will be difficult to maintain, especially for insertions
That one:
id_list first_item_id
1 45
2 38
coupled to the linked list:
item_id next_item foreign_itemdef_id
45 381 56
381 NULL 59
38 39 89
39 42 78
42 NULL 45
Will be both difficult to query and update (you should update the linked list inside a transaction, otherwise your linked list can get corrupted).
I would prefer the first solution for simplicity.
Depending on your update frequency, you may consider using large increments between item_priority to help insertion:
id_list item_priority foreign_itemdef_id
1 1000 245
1 2000 32
1 3000 45
2 1000 156
2 2000 248
2 3000 127
1 2500 46 -- late insertion
1 2750 47 -- late insertion
EDIT:
Here's a query that will hopefully make room for an insertion: it increments priority of all rows above the argument
$query_make_room_for_new_item = "UPDATE item_priority_table SET item_priority = item_priority + 1 WHERE item_priority > ". $new_item_position_priority ." AND id_list = ".$id_list;
Then insert your item with priority $new_item_position_priority

Oracle Sql Query taking a day long to return results using dblink

Guys i have the following oracle sql query that gives me the monthwise report between the dates.Basically for nov month i want sum of values between the dates 01nov to 30 nov.
The table that is being queried is residing in another database and accesssed using dblink. The DT columns is of NUMBER type (for ex 20101201).
SELECT /*+ PARALLEL (A 8) */ /*+ DRIVING_STATE(A) */
TO_CHAR(TRUNC(TRUNC(SYSDATE,'MM')- 1,'MM'),'MONYYYY') "MONTH",
TYPE AS "TYPE", COLUMN, COUNT (DISTINCT A) AS "A_COUNT",
COUNT (COLUMN) AS NO_OF_COLS, SUM (DURATION) AS "SUM_DURATION",
SUM (COST) AS "COST" FROM **A#LN_PROD A**
WHERE DT >= TO_NUMBER(TO_CHAR(add_months(SYSDATE,-1),'YYYYMM"01"'))
AND DT < TO_NUMBER(TO_CHAR(SYSDATE,'YYYYMM"01"'))
GROUP BY TYPE, COLUMN
The execution of the query is taking a day long and not completed. kindly suggest me , if their is any optimisation that can be suggested to my DBA on the dblink, or any tuning that can be done on the query , or rewriting the same.
UPDATES ON THE TABLE
The table is partiontioned on the date column and almost 1 billion records.
Below i have given the EXPLAIN PLAN from TOAD
**Plan**
SELECT STATEMENT REMOTE ALL_ROWSCost: 1,208,299 Bytes: 34,760 Cardinality: 790
12 PX COORDINATOR
11 PX SEND QC (RANDOM) SYS.:TQ10002 Cost: 1,208,299 Bytes: 34,760 Cardinality: 790
10 SORT GROUP BY Cost: 1,208,299 Bytes: 34,760 Cardinality: 790
9 PX RECEIVE Cost: 1,208,299 Bytes: 34,760 Cardinality: 790
8 PX SEND HASH SYS.:TQ10001 Cost: 1,208,299 Bytes: 34,760 Cardinality: 790
7 SORT GROUP BY Cost: 1,208,299 Bytes: 34,760 Cardinality: 790
6 PX RECEIVE Cost: 1,208,299 Bytes: 34,760 Cardinality: 790
5 PX SEND HASH SYS.:TQ10000 Cost: 1,208,299 Bytes: 34,760 Cardinality: 790
4 SORT GROUP BY Cost: 1,208,299 Bytes: 34,760 Cardinality: 790
3 FILTER
2 PX BLOCK ITERATOR Cost: 1,203,067 Bytes: 15,066,833,144 Cardinality: 342,428,026 Partition #: 11 Partitions accessed #1 - #5
1 TABLE ACCESS FULL TABLE CDRR.FRD_CDF_DATA_INTL_IN_P Cost: 1,203,067 Bytes: 15,066,833,144 Cardinality: 342,428,026 Partition #: 11
The following things i am going to do today ,any additional tips would be helpful.
I am going to gather the tablewise statistics for this table, which may give optimal
execution plan.
Check whether an local index is created for the partition .
using BETWEEN instead of >= and <.
As usual for this type of question, an explain plan would be useful. It would help us work out what is actually going on in the database.
Ideally you want to make sure the query is running on the remote database the sending the result set back, rather than sending the data across the link and running the query locally. This ensures that less data is sent across the link. The DRIVING_SITE hint can help with this, although Oracle is usually fairly smart about it so it might not help at all.
Oracle seems to have got better at running remote queries but there still can be problems.
Also, it might pay to simplify some of your date conversions.
For example, replace this:
TO_CHAR(TRUNC(TRUNC(SYSDATE,'MM')- 1,'MM'),'MONYYYY')
with this:
TO_CHAR(add_months(TRUNC(SYSDATE,'MM'), -1),'MONYYYY')
It is probably slightly more efficient but also is easier to read.
Likewise replace this:
WHERE DT >=TO_NUMBER(TO_CHAR(TRUNC(TRUNC(SYSDATE,'MM')-1,'MM'),'YYYYMMDD'))
AND DT < TO_NUMBER(TO_CHAR(TRUNC(TRUNC(SYSDATE,'MM'),'MM'),'YYYYMMDD'))
with
WHERE DT >=TO_NUMBER(TO_CHAR(add_months(TRUNC(SYSDATE,'MM'), -1),'YYYYMMDD'))
AND DT < TO_NUMBER(TO_CHAR(TRUNC(SYSDATE,'MM'),'YYYYMMDD'))
or even
WHERE DT >=TO_NUMBER(TO_CHAR(add_months(SYSDATE,-1),'YYYYMM"01"'))
AND DT < TO_NUMBER(TO_CHAR(SYSDATE,'YYYYMM"01"'))
It may be because several issues:
1.Network speed because the database may be residing on different hardware.
However you can refer this link
http://www.experts-exchange.com/Database/Oracle/Q_21799513.html.
There is a similar issue.
Impossible to answer without knowing the table structure, constraints, indexes, data volume, resultset size, network speed, level of concurrency, execution plans etcetera.
Some things I would investigate:
If the table is partitioned, does statistics exist for the partition the query is hitting? A common problem is that statistics are gathered on an empty partition before data has been inserted. Then when you query it (before the statistics are refreshed) Oracle chooses an index scan, when in fact it should use an FTS on that partition.
Also related to statistics: Make sure that
WHERE DT >=TO_NUMBER(TO_CHAR(TRUNC(TRUNC(SYSDATE,'MM')-1,'MM'),'YYYYMMDD'))
AND DT < TO_NUMBER(TO_CHAR(TRUNC(TRUNC(SYSDATE,'MM'),'MM'),'YYYYMMDD'))
generates the same execution plan as:
WHERE DT >= 20101201
AND DT < 20110101
Updated
What version of Oracle are you on? The reason I'm asking is that on Oracle 10g and later, there is another implementation of group by that should have been selected in this case (hashing rather than sorting). It looks like you are basically sorting the 342 million rows returned from the date filter (14 gigabytes). Do you have the RAM to back that up? Otherwise you will be doing a multipass sort, spilling to disk. This is likely what is happening.
According to the plan, about 790 rows will be returned. Is that in the right ballpark?
If so, you can rule out network issues :)
Also, I'm not entirely familiar with the format on that plan. Is the table sub partitioned? Otherwise I don't get the partition #11 reference.

custom sorting or ordering a table without resorting the whole shebang

For ten years we've been using the same custom sorting on our tables, I'm wondering if there is another solution which involves fewer updates, especially since today we'd like to have a replication/publication date and wouldn't like to have our replication replicate unnecessary entries.I had a look into nested sets, but it doesn't seem to do the job for us.
Base table:
id | a_sort
---+-------
1 10
2 20
3 30
After inserting:
insert into table (a_sort) values(15)
An entry at the second position.
id | a_sort
---+-------
1 10
2 20
3 30
4 15
Ordering the table with:
select * from table order by a_sort
and resorting all the a_sort entries, updating at least id=(2,3,4)
will of course produce the desired output:
id | a_sort
---+-------
1 10
4 20
2 30
3 40
The column names, the column count, datatypes, a possible join, possible triggers or the way the resorting is done is/are irrelevant to the problem.Also we've found some pretty neat ways to do this task fast.
only; how the heck can we reduce the updates in the db to 1 or 2 max.
Seems like an awfully common problem.
The captain obvious in me thougth once "use an a_sort float(53), insert using a fixed value of ordervaluefirstentry+abs(ordervaluefirstentry-ordervaluenextentry)/2".
But this would only allow around 1040 "in between" entries - so never resorting seems a bit problematic ;)
You really didn't describe what you're doing with this data, so forgive me if this is a crazy idea for your situation:
You could make a sort of 'linked list' where instead of a column of values, you have a column for the 'next highest valued' id. This would decrease the number of updates to a maximum of 2.
You can make it doubly linked and also have a column for next lowest, which would bring the maximum number of updates to 3.
See:
http://en.wikipedia.org/wiki/Linked_list