Select data with a dynamic where clause on non-indexed column - indexing

I have a table with 30 columns and millions of entries.
I want to execute a stored procedure on this table to search data.
The search criteria are passed in a parameter to this SP.
If I serach data with a dynamic WHERE clause on non-indexed column, it spends a lot of time.
Below is an example :
Select counterparty_name from counterparty where counterparty_name = 'test'
In this example this counterparty is in th row number 5000000.
As explained,I can't create an index to this table .
I would like to know if the processing time is normal.
I would like to know if there is any recommandation that can improve the execution time?
Best regards.

If you do not have an index on the column then it will have to do a scan of the clustered index in order to look for the data (or maybe a smaller index which might have that column included in it). As such it is going to take a long time.

Related

PostgreSQL index reduces data size but makes the query slower

I have a PostgreSQL table with 7.9GB of JSON data. My goal is to perform aggregations on the whole table on a daily basis, the aggregation results will later be used for analytical reports in Google Data Studio.
One of the queries I'm trying to run looks as follows:
explain analyze
select tender->>'procurementMethodType' as procurement_method,
tender->>'status' as tender_status,
sum(cast(tender->'value'->>'amount' as decimal)) as total_expected_value
from tenders
group by 1,2
The query plan and execution time are the following:
The problem is that the database has to scan through all the 7.9GB of data, even though the query uses only 3 field values out of approximately 100. So I decided to create the following index:
create index on tenders((tender->>'procurementMethodType'), (tender->>'status'), (cast(tender->'value'->>'amount' as decimal)))
The size of the index is 44MB, which is much smaller than the size of the entire table, so I expect that the query should be much faster. However, when I run the same query with the index created, I get the following result:
The query with index is slower! How can this be possible?
EDIT: the table itself contains two columns: the ID column and the jsonb data column:
create table tenders (
id uuid primary key,
tender jsonb
)
The code that does an index only scan is somewhat deficient in this case. It thinks it needs "tender" to be available in the index in order to fulfill the demand for cast(tender->'value'->>'amount' as decimal). It fails to realize that having cast(tender->'value'->>'amount' as decimal) itself in the index obviates the need for "tender" itself. So it is doing a regular index scan, in which it has to jump from the index to the table for every row it will return, to fish out "tender" and then compute cast(tender->'value'->>'amount' as decimal). This means it is jumping all over the table doing random io, which is much slower than just reading the table sequentially and then doing a sort.
You could try an index on ((tender->>'procurementMethodType'), (tender->>'status'), tender). This index would be huge (as large as the table) if it can even be built, but would take away the need for a sort.
But your current query finishes in 30 seconds. For a query that is only run once a day, does it really need to be faster than this?

dictionary database, one table vs table for each char

I have a very simple database contains one table T
wOrig nvarchar(50), not null
wTran nvarchar(50), not null
The table has +50 million rows. I execute a simple query
select wTran where wOrig = 'myword'
The query takes about 40 sec to complete. I divided the table based on the first char of wOrig and the execution time is much smaller than before (based on each table new length).
Am I missing something here? Should not the database use more efficient way to do the search, like binary search?
My question What changes to the database options - based on this situation - could make the search more efficient in order to keep all the data in one table?
You should be using an index. For your query, you want an index on wTran(wOrig). Your query will be much faster:
create index idx_wTran_wOrig on wTran(wOrig);
Depending on considerations such as space and insert/update characteristics, a clustered index on (wOrig) or (wOrig, wTran) might be the best solution.

Oracle sql statement on very large table

I relative new to sql and I have a statement which takes forever to run.
SELECT
sum(a.amountcur)
FROM
custtrans a
WHERE
a.transdate <= '2013-12-31';
I's a large table but the statemnt takes about 6 minutes!
Any ideas why?
Your select, as you post it, will read 99% of the whole table (2013-12-31 is just a week ago, and i assume most entries are before that date and only very few after). If your table has many large columns (like varchar2(4000)), all that data will be read as well when oracle scans the table. So you might read several KB each row just to get the 30 bytes you need for amountcur and transdate.
If you have this scenario. create a combined index on transdate and amountcur:
CREATE INDEX myindex ON custtrans(transdate, amountcur)
With the combined index, oracle can read the index to fulfill your query and doesn't have to touch the main table at all, which might result in considerably less data that needs to be read from disk.
Make sure the table has an index on transdate.
create index custtrans_idx on custtrans (transdate);
Also if this field is defined as a date in the table then do
SELECT sum(a.amountcur)
FROM custtrans a
WHERE a.transdate <= to_date('2013-12-31', 'yyyy-mm-dd');
If the table is really large, the query has to scan every row with transdate below given.
Even if you have an index on transdate and it helps to stop the scan early (which it may not), when the number of matching rows is very high, it would take considerable time to scan them all and sum the values.
To speed things up, you could calculate partial sums, e.g. for each passed month, assuming that your data is historical and past does not change. Then you'd only need to scan custtrans only for 1-2 months, then quickly scan the table with monthly sums, and add the results.
Try to create an index only on column amountcur:
CREATE INDEX myindex ON custtrans(amountcur)
In this case Oracle will read most probably only the Index (Index Full Scan), nothing else.
Correction, as mentioned in comment. It must be a composite Index:
CREATE INDEX myindex ON custtrans(transdate, amountcur)
But maybe it is a bit useless to create an index just for a single select statement.
One option is to create an index on the column used in the where clause (this is useful if you want to retrieve only 10-15% rows by using indexed column).
Another option is to partition your table if it has millions of rows. In this case also if you try to retrieve 70-80% data, it wont help.
The best option is first to analyze your requirements and then make a choice.
Whenever you deal with date functions it's better to use to_date() function. Do not rely on implicit data type conversion.

Index Created but doesn't speed-up on retrieval process

I have created table as bellow
create table T1(num varchar2(20))
then I inserted 3 lac numbers in above table so now it looks like below
num
1
2
3
.
.
300000
Now if I do
select * from T1
then it takes 1min 15sec to completely fetch the records and as I created index on column num and if I use below query then it should be faster to fetch 3 lac records but it takes also 1min15sec for fetch the records
select * from T1 where num between '1' and '300000'
So how the index has improved my retrieval process?
The index does not improve the retrieval process when you are trying to fetch all rows.
The index makes it possible to find a subset of rows much more quickly.
An index can help if you want to retrieve a few rows from a large table. But since you retrieve all rows and since your index contains all the columns of your table, it won't speed up the query.
Furthermore, you don't tell us what tool you use to retrieve the data. I guess you use SQL Developer or Toad. So what you measure is the time it takes SQL Developer or Toad to store 300,000 rows in memory in such a way that they can be easily displayed on screen in a scrollable table. You aren't really measuring how long it takes to retrieve them.
To get a test of the effects of having an index in place you might want to try a query such as
SELECT *
FROM T1
WHERE NUM IN ('288888', '188888', '88888')
both with with the index in place, and again after removing the index. You should also collect statistics on the table prior to running the query with the index in place or you may still get a query which performs a full table scan. Share and enjoy.

Oracle Indexing & SP Performance

Am trying to optimize some legacy SQL SPs against an Oracle view which is built from 6 tables, each joined by the same field, a numeric ID. Some of the tables in the view have an index which is solely this ID field, others do not.
If I create an index on the remaining tables in the view using this field only, and then perform the actual select query using this field as the sole parameter, will it improve performance notably? I can post the s.proc if its necessary as there may be other flaws in the SP which may not be solved by indexing alone. The query in question takes around 6 seconds to return 1 row, none of the tables contain a large amount of records, nothing over 100,000 records anyway.
Thanks in advance,
Scott
Make sure every table in the view has an index that starts with the id field. The index can conatain more fields as long as the id field is first.
If that doesn't help performance please post the select statement and the explain plan.
If the ID field is the first column in the indexes (or the only column) then adding indexes to the ID column of those remaining tables that need them will improve the query if you are returning a small number of rows.