JSONB Joining with the Contains Operator

JSONB Joining with the Contains Operator - sql

I have two tables, ClaimPaymentHistory and RemittanceHistory which I am currently joining with the following query.
select rh."EventHistory"
from "ClaimPaymentHistory" ph, jsonb_array_elements(ph."EventHistory") payments
inner join "RemittanceHistory" rh
on payments->> 'rk' = rh."RemittanceRefKey"::text
where ph."ClaimRefKey" = #ClaimRefKey
I wanted to improve this query using the following index:
CREATE INDEX claim_payment_history_gin_idx ON "ClaimPaymentHistory"
USING gin ("EventHistory" jsonb_path_ops)
But I don't appear to get any improvement with this. However, I can see this index being leveraged if I query the EventHistory column of that table using the #> operator, for example like so:
select * from "ClaimPaymentHistory" where "EventHistory" #> '[{"rk": 637453920516771103}]';
So my question is, am I able to create a join using that contains operator? I've been playing with the syntax but can't get anything to work.
If I am unable to create a join with that operator, what would be my best options for indexing?

That index could be used if you wrote the query like this:
select rh."EventHistory"
from "RemittanceHistory" rh join "ClaimPaymentHistory" ph
on ph."EventHistory" #> jsonb_build_array(jsonb_build_object('rk',rh."RemittanceRefKey"))
where ph."ClaimRefKey" = 5;
However, this unlikely to have good performance unless "RemittanceHistory" has few rows in it.
...what would be my best options for indexing?
The obvious choice, if you don't have them already, would be regular (btree) indexes on rh."RemittanceRefKey" and ph."ClaimRefKey".
Also, look at (and show us) the EXPLAIN (ANALYZE, BUFFERS) for the original query you want to make faster.

I wound up refactoring the table structure. Instead of a join through RemittanceRefKey I added a JSONB column to RemittanceHistory called ClaimRefKeys. This is simply an array of integer values and now I can lookup the desired rows with:
select "EventHistory" from "RemittanceHistory" where "ClaimRefKeys" #> #ClaimRefKey;
This combined with the following index gives pretty fantastic performance.
CREATE INDEX remittance_history_claimrefkeys_gin_idx ON "RemittanceHistory" USING gin ("ClaimRefKeys" jsonb_path_ops);

Related

Sqlite is not using indexes

I face the following issues witn SQLite.
I have some view, which is supposed to normalize the table (convert names to numbers). For such purpose it requires many of left joins and becomes really slow without index. However, the indexes I create are not used by query, which I can see from query plan.
The view.
SELECT
STORENAME.ID AS _STORENAME_ID,
ARTICLEID.ID AS _ARTICLEID_ID,
ARTICLENAME.ID AS _ARTICLENAME_ID,
SECTION.ID AS _SECTION_ID,
....
SALESQUANTITY,
SALESAMOUNT
FROM
_SO_BUFFER_SHACOOP AS MAIN
LEFT JOIN _t_so_branch AS STORENAME ON MAIN.STORENAME = STORENAME.NAME
LEFT JOIN _t_so_itemnumber AS ARTICLEID ON MAIN.ARTICLEID = ARTICLEID.NAME
LEFT JOIN _t_so_itemname AS ARTICLENAME ON MAIN.ARTICLENAME = ARTICLENAME.NAME
LEFT JOIN _t_so_sections AS SECTION ON MAIN.SECTION = SECTION.NAME
...
WHERE
NOT (
salesamount is null or (ARTICLENAME is null and (storename is null or storename = 'TOTAL'))
)
When I am trying to explain this query plan, I can see that:
SQLite is using SCAN TABLE instead of SEARCH TABLE for _t_so_itemnumber (ARTICLEID). It is really weird since I have 2 indexes - one on MAIN table and one on _t_so_itemnumber table (check indexes below)
I can see that sqlite is not using any of my indexes, it uses automatic indexes only.
I have the following indexes (among many others):
CREATE INDEX _ix_tsoitemnumber__itemname on _t_so_itemnumber(name)
CREATE INDEX _ix_sobufffershacoop_articleid ON _SO_BUFFER_SHACOOP(ARTICLEID)
I am not sure what's wrong here - why it is ignoring those indexes which were created to cover this join?

Index in Oracle for JOIN does not seem to improve performance

I am trying to improve a query performance using index:
SELECT E.CLIENT, E.TITLE
FROM (SELECT D.CLIENT, D.TITLE, COUNT('X') TEMPORADAS
FROM (SELECT A.CLIENT, A.TITLE
FROM (SELECT CLIENT,TITLE,SEASON, COUNT('X') N_EPISODIOS
FROM LIC_SERIES GROUP BY CLIENT,TITLE,SEASON) A
JOIN SEASONS B ON (A.TITLE=B.TITLE AND A.SEASON=B.SEASON AND A.N_EPISODIOS=B.EPISODES)) D
GROUP BY D.CLIENT, D.TITLE) E
JOIN SERIES F ON (E.TITLE=F.TITLE AND E.TEMPORADAS=F.TOTAL_SEASONS)
The main idea was to create two index, with the values that are in the ON clause from the inner query (the deepest JOIN):
DROP INDEX INDEX_TAPS_MOVIES;
DROP INDEX INDEX_CASTS;
CREATE INDEX INDEX_TAPS_MOVIES
ON TAPS_MOVIES(TITLE);
CREATE INDEX INDEX_CONTRACTS
ON CASTS(TITLE);
But after analyse the result, there is not any improvement in the load time... I have trying to force using Hints but I get even worst performance. What could be the key to use Index in these type of queries?
Should be a better way to optimize it rather than index?
Kindest Regards

Oracle optimizer do not always use indexs. it checks diffrent execution plans and choose the best one.sometimes not using the index return better performance.
to improve performance you can parrallel your table(see the parallelizem in oracle).
for make a table to execute parrallel use following code :
ALTER TABLES YourTableName PARALLEL 4;
or you can use hint as follow :
/*+ PARALLEL(4) */
be carfull about parallelizem's degree.

Query not using indexes using using Table function

I have following query:
select i.pkey as instrument_pkey,
p.asof,
p.price,
p.lastprice as lastprice,
p.settlementprice as settlement_price,
p.snaptime,
p.owner as source_id,
i.type as instrument_type
from quotes_mxsequities p,
instruments i,
(select instrument, maxbackdays
from TABLE(cast (:H_ARRAY as R_TAB))) lbd
where p.asof between :ASOF - lbd.maxbackdays and :ASOF
and p.instrument = lbd.instrument
and p.owner = :SOURCE_ID
and p.instrument = i.pkey
Since I have started using table function, query has started doing full table scan on table quotes_mxsequities which is large table.
Earlier when I used IN clause include of table function index was being used.
Any suggestion on how to enforce index usage?
EDIT:
I will try to get explain plan but just to add, H_ARRAY is expected to have around 10k entries. quotes_mxsequities is a large table millions of rows. Instruments is again a large table but has lesser rows than quotes_mxsequities.
Full table scan is happening for quotes_mxsequities while instruments is using index

Quite difficult to answer with no explain plan and informations about table structure, number of rows, etc.
As a general, simplified approach, you could try to force the use on an index with the INDEX hint.
Your problem can even be due to a wrong order in table processing; you can try to make Oracle follow the right order ( I suppose LBD first) with the LEADING hint.
Another point could be the full access, while you probably need a NESTED LOOP; in this case you can try the USE_NL hint

It's hard to be sure form the limited information provided, but it looks like this is an issue with the optimiser not being able to establish the cardinality of the table collection expression, since its contents aren't known at parse time. With a stored nested table the statistics would be available, but here there are none for it to use.
Without that information the optimiser defaults to guessing your table collection will have 8K entries, and uses that as the cardinality estimate; if that is a significant proportion of the number of rows in quotes_mxsequities then it will decide the index isn't going to be efficient, and will use a full table scan.
You can use the undocumented cardinality hint to tell the optimiser roughly how many elements you actually expect in the collection; you presumably won't know exactly, but you might know you usually expect around 10. So you could add a hint:
select /*+ CARDINALITY(lbd, 10) */ i.pkey as instrument_pkey,
You may also find the dynamic sampling hint useful here, but without your real data to look at, the cardinality hint applies to the basic execution plan so it's easy to see its effect.
Incidentally, you don't need the subquery on the table expression, you can simplify slightly to:
from TABLE(cast (:H_ARRAY as R_TAB)) lbd,
quotes_mxsequities p,
instruments i
or even better use modern join syntax:
select /*+ CARDINALITY(lbd, 10) */ i.pkey as instrument_pkey,
p.asof,
p.price,
p.lastprice as lastprice,
p.settlementprice as settlement_price,
p.snaptime,
p.owner as source_id,
i.type as instrument_type
from TABLE(cast (:H_ARRAY as R_TAB)) lbd
join quotes_mxsequities p
on p.asof between :ASOF - lbd.maxbackdays and :ASOF
and p.instrument = lbd.instrument
join instruments i
on i.pkey = p.instrument
where p.owner = :SOURCE_ID;

What colums to index for a JOIN with WHERE

Assume you have a JOIN with a WHERE:
SELECT *
FROM partners
JOIN orders
ON partners.partner_id = orders.partner_id
WHERE orders.date
BETWEEN 20140401 AND 20140501
1) An index on partner_id in both tables will speed up the JOIN, right?
2) An index on orders.date will speed up the WHERE clause?
3) But as far as I know, one SELECT can not use more than one index. So which one will be used?

This is your query, with the quoting fixed (and assuming orders.date is really a date type):
SELECT *
FROM partners JOIN
orders
ON partners.partner_id = orders.partner_id
WHERE orders.date BETWEEN '2014-04-01' AND '2014-05-01';
For inner join, there are basically two execution strategies. The engine can start with the partners table and find all matches in orders. Or it can start with orders can find all matches in partners. (There are then different algorithms that can be used.)
For the first approach, the only index what would help is orders(partner_id, orderdate). For the second approach, the best index is orders(orderdate, partner_id). Note that these are not equivalent.
In most scenarios like this, I would expect the orders table to be larger and the filtering to be important. That would suggest that the best execution plan is to start with the orders table and filter it first, using the second option.

To start, an index is used for an operator not for the SELECT statement. Therefore one index will be used for reading data from the partner table and another index could be used to get data from orders table.
I think that the best strategy in this case would be to have a clustered index on partners.partner_id and one non-clustered index on orders.partner_id and orders.date

See the case. It is a sample case
SELECT *
FROM [dbo].[LUEducation] E
JOIN LUCitizen C On C.skCitizen = E.skCitizen
WHERE C.skCitizen <= 100
AND E.skSchool = 26069
Execution Plan:
The sql engine uses more than 1 index at a time.

Without knowing which DBMS you are using it's difficult to know what execution plan the optimizer is going to choose.
Here's a typical one:
Do a range scan on orders.date, using a sorted index for that purpose.
Do a loop join on the results, doing one lookup on
partners.partner_id for each entry, using the index on that field.
In this plan, an index on orders.partner_id will not be used.
However, if the WHERE clause were not there, you might see an execution plan that
does a merge join using the indexes on partners.partner_id and
orders.partner_id.
This terminology may be confusing, because the documentation for your DBMS may use different terms.

One select can only use one index per table (index-merge is an exception).
You pointed out the right indexes in your question.
You don't really need an index on orders.partner_id for this query,
but it is necessary for foreign key constraints and join in other direction.

sql, query optimisation with and inner join?

I'm trying to optimise my query, it has an inner join and coalesce.
The join table, is simple a table with one field of integer, I've added a unique key.
For my where clause I've created a key for the three fields.
But when I look at the plan it still says it's using a table scan.
Where am I going wrong ?
Here's my query
select date(a.startdate, '+'||(b.n*a.interval)||' '||a.intervaltype) as due
from billsndeposits a
inner join util_nums b on date(a.startdate, '+'||(b.n*a.interval)||'
'||a.intervaltype) <= coalesce(a.enddate, date('2013-02-26'))
where not (intervaltype = 'once' or interval = 0) and factid = 1
order by due, pid;

Most likely your JOIN expression cannot use any index and it is calculated by doing a NATURAL scan and calculate date(a.startdate, '+'||(b.n*a.interval)||' '||a.intervaltype) for every row.
BTW: That is a really weird join condition in itself. I suggest you find a better way to join billsndeposits to util_nums (if that is actually needed).

I think I understand what you are trying to achieve. But this kind of join is a recipe for slow performance. Even if you remove date computations and the coalesce (i.e. compare one date against another), it will still be slow (compared to integer joins) even with an index. And because you are creating new dates on the fly you cannot index them.
I suggest creating a temp table with 2 columns (1) pid (or whatever id you use in billsndeposits) and (2) recurrence_dt
populate the new table using this query:
INSERT INTO TEMP
SELECT PID, date(a.startdate, '+'||(b.n*a.interval)||' '||a.intervaltype)
FROM billsndeposits a, util_numbs b;
Then create an index on recurrence_dt columns and runstats. Now your select statement can look like this:
SELECT recurrence_dt
FROM temp t, billsndeposits a
WHERE t.pid = a.pid
AND recurrence_dt <= coalesce(a.enddate, date('2013-02-26'))
you can add a exp_ts on this new table, and expire temporary data afterwards.
I know this adds more work to your original query, but this is a guaranteed performance improvement, and should fit naturally in a script that runs frequently.
Regards,
Edit
Another thing I would do, is make enddate default value = date('2013-02-26'), unless it will affect other code and/or does not make business sense. This way you don't have to work with coalesce.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas