can not use sorted index in join query

can not use sorted index in join query - ignite

I have a query:
SELECT pr."productId" FROM "ProductRecord" pr
INNER JOIN "ProductCategory" pc on pr."productId" = pc."productId"
WHERE pc."category" = ? AND pc."language" = ?
ORDER BY pr."dailyRock" DESC `
There is a descending ordered index on field dailyRock but since productId index is used on join the sorted index can not be used and everytime the aggregated result is needed to be sorted.
Can I force ignite to use both productId index and dailyRock index of ProductRecord table in the query above?

No, only one index per table may be used in each query.
You can try creating a composite index (dailyRock, productId) or (productId, dailyRock) although I'm not sure it'll help to avoid the sorting.

Related

Speed up LEFT OUTER JOIN query in Firebird

The question is for Firebird 2.5. Let's assume we have the following query:
SELECT
EVENTS.ID,
EVENTS.TS,
EVENTS.DEV_TS,
EVENTS.COMPLETE_TS,
EVENTS.OBJ_ID,
EVENTS.OBJ_CODE,
EVENTS.SIGNAL_CODE,
EVENTS.SIGNAL_EVENT,
EVENTS.REACTION,
EVENTS.PROT_TYPE,
EVENTS.GROUP_CODE,
EVENTS.DEV_TYPE,
EVENTS.DEV_CODE,
EVENTS.SIGNAL_LEVEL,
EVENTS.SIGNAL_INFO,
EVENTS.USER_ID,
EVENTS.MEDIA_ID,
SIGNALS.ID AS SIGNAL_ID,
SIGNALS.SIGNAL_TYPE,
SIGNALS.IMAGE AS SIGNAL_IMAGE,
SIGNALS.NAME AS SIGNAL_NAME,
REACTION.INFO,
USERS.NAME AS USER_NAME
FROM EVENTS
LEFT OUTER JOIN SIGNALS ON (EVENTS.SIGNAL_ID = SIGNALS.ID)
LEFT OUTER JOIN REACTION ON (EVENTS.ID = REACTION.EVENTS_ID)
LEFT OUTER JOIN USERS ON (EVENTS.USER_ID = USERS.ID)
WHERE (TS BETWEEN '27.07.2021 00:00:00' AND '28.07.2021 10:34:08')
AND (OBJ_ID = 8973)
AND (DEV_CODE IN (0, 1234))
AND (DEV_TYPE = 79)
AND (PROT_TYPE = 8)
ORDER BY TS;
EVENTS has about 190 million records by now and this query takes too much time to complete. As I read here, the tables have to have indexes on all the columns that are used.
Here are the CREATE INDEX statements for the EVENTS table:
CREATE INDEX FK_EVENTS_OBJ ON EVENTS (OBJ_ID);
CREATE INDEX FK_EVENTS_SIGNALS ON EVENTS (SIGNAL_ID);
CREATE INDEX IDX_EVENTS_COMPLETE_TS ON EVENTS (COMPLETE_TS);
CREATE INDEX IDX_EVENTS_OBJ_SIGNAL_TS ON EVENTS (OBJ_ID,SIGNAL_ID,TS);
CREATE INDEX IDX_EVENTS_TS ON EVENTS (TS);
Here is the data from the PLAN analyzer:
PLAN JOIN (JOIN (JOIN (EVENTS ORDER IDX_EVENTS_TS INDEX (FK_EVENTS_OBJ, IDX_EVENTS_TS), SIGNALS INDEX (PK_SIGNALS)), REACTION INDEX (IDX_REACTION_EVENTS)), USERS INDEX (PK_USERS))
As requested the speed of the execution:
without LEFT JOIN -> 138ms
with LEFT JOIN -> 338ms
Is there another way to speed up the execution of the query besides indexing the columns or maybe add another index?
If I add another index will the optimizer choose to use it?

You can only optimize the joins themselves by being sure that the keys are indexed in the second tables. These all look like primary keys, so they should have appropriate indexes.
For this WHERE clause:
WHERE TS BETWEEN '27.07.2021 00:00:00' AND '28.07.2021 10:34:08')
OBJ_ID = 8973 AND
DEV_CODE IN (0, 1234) AND
DEV_TYPE = 79 AND
PROT_TYPE = 8
You probably want an index on (OBJ_ID, DEV_TYPE, PROT_TYPE, TS, DEV_CODE). The order of the first three keys is not particularly important because they are all equality comparisons. I am guessing that one day of data is fewer rows than two device codes.

First of all you want to find the table1 rows quickly. You are using several columns in your WHERE clause to get them. Provide an index on these columns. Which column is the most selective? I.e. which criteria narrows the result rows most? Let's say it's dt, so we put this first:
create index idx1 on table1 (dt, oid, pt, ts, dc);
I have put ts and dt last, because we are looking for more than one value in these columns. It may still be that putting ts or dsas the first column is a good choice. Sometimes we have to play around with this. I.e. provide several indexes with the column order changed and then see which one gets used by the DBMS.
Tables table2 and tabe4 get accessed by the primary key for which exists an index. But table3 gets accessed by t1id. So provide an index on that, too:
create index idx2 on table3 (t1id);

What is index do i need?

I have problems with the performance of this query. If I remove Order by section all work well. But I really want it. I tried to use many indexes but have not any results. Can you help me pls?
SELECT *
FROM "refuel_request" AS "refuel_request"
LEFT OUTER JOIN "user" AS "user" ON "refuel_request"."user_id" = "user"."user_id"
LEFT OUTER JOIN "bill_qr" AS "bill_qr" ON "refuel_request"."bill_qr_id" = "bill_qr"."bill_qr_id"
LEFT OUTER JOIN "car" AS "order.car" ON "refuel_request"."car_id" = "order.car"."car_id"
LEFT OUTER JOIN "refuel_request_status" AS "refuel_request_status" ON "refuel_request"."refuel_request_status_id" = "refuel_request_status"."refuel_request_status_id"
WHERE
refuel_request."refuel_request_status_id" IN ( '1', '2', '3')
ORDER BY "refuel_request".created_at desc
LIMIT 10
There is explain of this query
EXPLAIN (ANALYZE, BUFFERS)
Primary Keys and/or Foreign Keys
pk_refuel_request_id
refuel_request_bill_qr_id_fkey
refuel_request_user_id_fkey

All outer joind tables are 1:n related to refuel_request. This means your query is looking for the last ten created refuel requests with status 1 to 3.
You are outer joining the tables, because not every reful_request is related to a user, a bill_qr, a car, and a status. Or you outer join mistakenly. Anyway, none of the joins changes the number of retrieved rows; it's still one row per refuel request. In order to join the other tables' rows the DBMS just needs their primary key indexes. Nothing to worry about.
The only thing we must care about is finding the top reful_request rows for the statuses you are interested in as quickly as possible.
Use a partial index that only contains data for the statuses in question. The column you index is the created_at column, so as to get the top 10 immediately.
CREATE INDEX idx ON refuel_request (created_at DESC)
WHERE refuel_request_status_id IN (1, 2, 3);
Partial indexes are explained here: https://www.postgresql.org/docs/current/indexes-partial.html

You cannot have an index that supports both the WHERE condition and the ORDER BY, because you are using IN and not =.
The fastest option is to split the query into three parts, so that each part compares refuel_request.refuel_request_status_id with =. Combine these three queries with UNION ALL. Each of the queries has ORDER BY and LIMIT 10, and you wrap the whole thing in an outer query that has another ORDER BY and LIMIT 10.
Then you need these indexes:
CREATE INDEX ON refuel_request (refuel_request_status_id, created_at);
CREATE INDEX ON "user" (user_id);
CREATE INDEX ON bill_qr (bill_qr_id);
CREATE INDEX ON car (car_id);
CREATE INDEX ON refuel_request_status (refuel_request_status_id);

You need at least the indexes for the joins (do you really need LEFT joins?)
LEFT OUTER JOIN "user" AS "user" ON "refuel_request"."user_id" = "user"."user_id"
So, refuel_request.user_id must be in the index
LEFT OUTER JOIN "bill_qr" AS "bill_qr" ON "refuel_request"."bill_qr_id" =
LEFT OUTER JOIN "car" AS "order.car" ON "refuel_request"."car_id" =
bill_qr_id and car_id too
LEFT OUTER JOIN "refuel_request_status" AS "refuel_request_status" ON "refuel_request"."refuel_request_status_id" =
and refuel_request_status_id
WHERE
refuel_request."refuel_request_status_id" IN ( '1', '2', '3')
refuel_request_status_id must be the first key in the index as we need it in the WHERE
ORDER BY "refuel_request".created_at desc
and then created_at since it's in the ORDER clause. This will not improve performances per se, but will allow to run the ORDER BY without requiring access to the table data, the same reason why we put the other non-WHERE columns in there. Of course a partial index is even better, we shift the WHERE in the partiality clause and use created_at for the rest (the LIMIT 10 now means that we can do without the extra columns in the index, since retrieving three 1:N rows costs very little; in a different situation we might find it useful to keep those extra columns).
So one index that contains, in this order:
refuel_request_status_id, created_at, bill_qr_id, car_id too, user_id
^ WHERE ^ ORDER ^ used by the JOINS
However, do you really need a SELECT *? I believe you'd get better performances if you only included the fields you're really going to use.

The most effective index for this query would be on refuel_request (refuel_request_status_id, created_at DESC) so that both the main filtering and the ordering can be done using the index. You also want indexes on the columns you're joining, but those tables are small and inconsequential at the moment. In any case, the index I suggest isn't actually going to help much with the performance pain points you're having right now. Here are some suggestions:
Don't use SELECT * unless you really need all of the columns from all of these tables you're joining. Specifying only the necessary columns means postgres can load less data into memory, and work over it faster.
Postgres is spending a lot of time on the joins, joining about a million rows each time, when you're really only interested in ten of those rows. We can encourage it do the order/limit first by rearranging the query somewhat:
WITH refuel_request_subset AS MATERIALIZED (
SELECT *
FROM refuel_request
WHERE refuel_request_status_id IN ('1', '2', '3')
ORDER BY created_at DESC
LIMIT 10
)
SELECT *
FROM refuel_request_subset AS refuel_request
LEFT OUTER JOIN user ON refuel_request.user_id = user.user_id
LEFT OUTER JOIN bill_qr ON refuel_request.bill_qr_id = bill_qr.bill_qr_id
LEFT OUTER JOIN car AS "order.car" ON refuel_request.car_id = "order.car".car_id
LEFT OUTER JOIN refuel_request_status ON refuel_request.refuel_request_status_id = refuel_request_status.refuel_request_status_id;
Note: This assumes that the LEFT JOINS will not add rows to the result set, as is the case with your current dataset.
This trick only really works if you have a fixed number of IDs, but you can do the refuel_request_subset query separately for each ID and then UNION the results, as opposed to using the IN operator. That would allow postgres to fully use the index mentioned above.

How do you determine the order of columns in a (filtered) index for a query with joins?

I have a query that is hard coded in an application, in old ANSI style, but I hope to optimise with an appropriate filtered index.
SELECT HISTORY.UNITS
FROM branch, detail, HISTORY --WITH(INDEX(test))
WHERE
HISTORY.DETAIL_ID = detail.ID
AND HISTORY.BRANCH_ID = branch.ID
AND HISTORY.WEEK_SELECTOR=123456
AND (branch.BRANCH_CODE='016')
AND (detail.CODE='01308054')
I have tried a filtered index on HISTORY and switched the columns around to no avail. Branch and Detail are small tables but HISTORY is very large.
CREATE UNIQUE NONCLUSTERED INDEX [test] ON [HISTORY]
(
[WEEK_SELECTOR] ASC,
[DETAIL_ID] ASC,
[BRANCH_ID] ASC
)
INCLUDE ([UNITS])
WHERE ([week_selector]=(123456))
When I force it to try to use the index I get:
Msg 8622, Level 16, State 1, Line 1 Query processor could not produce
a query plan because of the hints defined in this query. Resubmit the
query without specifying any hints and without using SET FORCEPLAN.
Please can someone tell me how to work out the correct order, taking the JOINs and WHERE clause into consideration?

SELECT HISTORY.UNITS
FROM HISTORY
INNER JOIN branch
ON branch.ID = HISTORY.BRANCH_ID
INNER JOIN detail
ON detail.ID = HISTORY.DETAIL_ID
WHERE
HISTORY.WEEK_SELECTOR = 123456
AND branch.BRANCH_CODE = '016'
AND detail.CODE = '01308054';
I would personally never use query hints in a production environment. Indexes I'd use:
detail(CODE, ID)
branch(BRANCH_CODE, ID)
HISTORY(WEEK_SELECTOR, BRANCH_ID, DETAIL_ID) with included UNITS column
I'm not certain about your data, but you might have to swap around DETAIL_ID and BRANCH_ID for better performance gains.

How to write a SQL Query in a better way for optimization?

When I execute the below query it took nearly 4 min.
This is the query
SELECT
transactionsEntry.StoreID StoreID,
items.ItemLookupCode ItemLookupCode,
SUM(transactionsEntry.Quantity)
FROM
[HQMatajer].[dbo].[TransactionEntry] transactionsEntry
RIGHT JOIN
[HQMatajer].[dbo].[Transaction] transactions ON transactionsEntry.TransactionNumber = transactions.TransactionNumber
INNER JOIN
[HQMatajer].[dbo].[Item] items ON transactionsEntry.ItemID = items.ID
WHERE
YEAR(transactions.Time) = 2015
AND MONTH(transactions.Time) = 1
GROUP BY
transactionsEntry.StoreID, items.ItemLookupCode
ORDER BY
items.ItemLookupCode
TransactionEntry table may have 90 billion records, Transaction table has 30 billion records, item table has 40 k records.
Estimation Cost
. It shows 84%. and it is clustered index.

Avoid function calls - they prevent usage of indexes. Try
Where transactions.Time >= '2015-01-01'
and transactions.Time < '2015-02-01'
If you don't have an index on the column transactions.Time then add an index for this column.

Create an Index for all the table which and all you are using in your query.
That is the better way for to produce the result very fastly.
for example
Create index for Time and TransactionNumber in Transaction Table
Create index for TransactionNumber, ItemID and StoreID in TransactionEntry Table
Create index for ItemID in Item Table.
Please visit this site. You can learn from the basic for query tuning and SQL optimization

You need to prevent the Clustered Index Scan.
I recommend creating a Covering Index on [transactionsentry]:
Key Columns:[TransactionNumber],[ItemID]
Include:[StoreID]
Also try this Index on [Transaction]:
Key Columns:[time],[TransactionNumber]
(Sorry I can't provide greater depth, but I don't know your current Indexing structure)

Try This code once it may help you
select transactionsEntry.StoreID StoreID,items.ItemLookupCode ItemLookupCode,SUM(transactionsEntry.Quantity)
FROM [HQMatajer].[dbo].[TransactionEntry] transactionsEntry
INNER JOIN [HQMatajer].[dbo].[Item] items ON transactionsEntry.ItemID = items.ID
RIGHT JOIN [HQMatajer].[dbo].[Transaction] transactions ON transactionsEntry.TransactionNumber = transactions.TransactionNumber
Where transactions.[Time] >= '2015-01-01' and transactions.[Time] < '2015-02-01'
GROUP BY transactionsEntry.StoreID,items.ItemLookupCode
ORDER BY items.ItemLookupCode

What colums to index for a JOIN with WHERE

Assume you have a JOIN with a WHERE:
SELECT *
FROM partners
JOIN orders
ON partners.partner_id = orders.partner_id
WHERE orders.date
BETWEEN 20140401 AND 20140501
1) An index on partner_id in both tables will speed up the JOIN, right?
2) An index on orders.date will speed up the WHERE clause?
3) But as far as I know, one SELECT can not use more than one index. So which one will be used?

This is your query, with the quoting fixed (and assuming orders.date is really a date type):
SELECT *
FROM partners JOIN
orders
ON partners.partner_id = orders.partner_id
WHERE orders.date BETWEEN '2014-04-01' AND '2014-05-01';
For inner join, there are basically two execution strategies. The engine can start with the partners table and find all matches in orders. Or it can start with orders can find all matches in partners. (There are then different algorithms that can be used.)
For the first approach, the only index what would help is orders(partner_id, orderdate). For the second approach, the best index is orders(orderdate, partner_id). Note that these are not equivalent.
In most scenarios like this, I would expect the orders table to be larger and the filtering to be important. That would suggest that the best execution plan is to start with the orders table and filter it first, using the second option.

To start, an index is used for an operator not for the SELECT statement. Therefore one index will be used for reading data from the partner table and another index could be used to get data from orders table.
I think that the best strategy in this case would be to have a clustered index on partners.partner_id and one non-clustered index on orders.partner_id and orders.date

See the case. It is a sample case
SELECT *
FROM [dbo].[LUEducation] E
JOIN LUCitizen C On C.skCitizen = E.skCitizen
WHERE C.skCitizen <= 100
AND E.skSchool = 26069
Execution Plan:
The sql engine uses more than 1 index at a time.

Without knowing which DBMS you are using it's difficult to know what execution plan the optimizer is going to choose.
Here's a typical one:
Do a range scan on orders.date, using a sorted index for that purpose.
Do a loop join on the results, doing one lookup on
partners.partner_id for each entry, using the index on that field.
In this plan, an index on orders.partner_id will not be used.
However, if the WHERE clause were not there, you might see an execution plan that
does a merge join using the indexes on partners.partner_id and
orders.partner_id.
This terminology may be confusing, because the documentation for your DBMS may use different terms.

One select can only use one index per table (index-merge is an exception).
You pointed out the right indexes in your question.
You don't really need an index on orders.partner_id for this query,
but it is necessary for foreign key constraints and join in other direction.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

can not use sorted index in join query - ignite

No, only one index per table may be used in each query. You can try creating a composite index (dailyRock, productId) or (productId, dailyRock) although I'm not sure it'll help to avoid the sorting.

Related

Speed up LEFT OUTER JOIN query in Firebird

What is index do i need?

How do you determine the order of columns in a (filtered) index for a query with joins?

How to write a SQL Query in a better way for optimization?

What colums to index for a JOIN with WHERE

Categories

Resources