Postgres left outer join appears to not be using table indices - sql

Let me know if this should be posted on DBA.stackexchange.com instead...
I have the following query:
SELECT DISTINCT "court_cases".*
FROM "court_cases"
LEFT OUTER JOIN service_of_processes
ON service_of_processes.court_case_id = court_cases.id
LEFT OUTER JOIN jobs
ON jobs.service_of_process_id = service_of_processes.id
WHERE
(jobs.account_id = 250093
OR court_cases.account_id = 250093)
ORDER BY
court_cases.court_date DESC NULLS LAST,
court_cases.id DESC
LIMIT 30
OFFSET 0;
But it takes a good 2-4 seconds to run, and in a web application this is unacceptable for a single query.
I ran EXPLAIN (ANALYZE, BUFFERS) on the query as suggested on the PostgreSQL wiki, and have put the results here: http://explain.depesz.com/s/Yn6
The table definitions for those tables involved in the query is here (including the indexes on foreign key relationships):
http://sqlfiddle.com/#!15/114c6
Is it having issues using the indexes because the WHERE clause is querying from two different tables? What kind of index or change to the query can I make to make this run faster?
These are the current sizes of the tables in question:
PSQL=# select count(*) from service_of_processes;
count
--------
103787
(1 row)
PSQL=# select count(*) from jobs;
count
--------
108995
(1 row)
PSQL=# select count(*) from court_cases;
count
-------
84410
(1 row)
EDIT: I'm on Postgresql 9.3.1, if that matters.

or clauses can make optimizing a query difficult. One idea is to split the two parts of the query into two separate subqueries. This actually simplifies one of them a lot (the one on court_cases.account_id).
Try this version:
(SELECT cc.*
FROM "court_cases" cc
WHERE cc.account_id = 250093
ORDER BY cc.court_date DESC NULLS LAST,
cc.id DESC
LIMIT 30
) UNION ALL
(SELECT cc.*
FROM "court_cases" cc LEFT OUTER JOIN
service_of_processes sop
ON sop.court_case_id = cc.id LEFT OUTER JOIN
jobs j
ON j.service_of_process_id = sop.id
WHERE (j.account_id = 250093 AND cc.account_id <> 250093)
ORDER BY cc.court_date DESC NULLS LAST, id DESC
LIMIT 30
)
ORDER BY court_date DESC NULLS LAST,
id DESC
LIMIT 30 OFFSET 0;
And add the following indexes:
create index court_cases_accountid_courtdate_id on court_cases(account_id, court_date, id);
create index jobs_accountid_sop on jobs(account_id, service_of_process_id);
Note that the second query uses and cc.count_id <> 250093, which prevents duplicate records. This eliminates the need for distinct or for union (without union all).

I'll try modifying the query as the following:
SELECT DISTINCT "court_cases".*
FROM "court_cases"
LEFT OUTER JOIN service_of_processes
ON service_of_processes.court_case_id = court_cases.id
LEFT OUTER JOIN jobs
ON jobs.service_of_process_id = service_of_processes.id and jobs.account_id = 250093
WHERE
(court_cases.account_id = 250093)
ORDER BY
court_cases.court_date DESC NULLS LAST,
court_cases.id DESC
LIMIT 30
OFFSET 0;
I think that the issue is in the fact that the where filter is not properly decomposed by query planner optimizer, a really strange performance bug

Related

Limit number of rows fetching in a left outer join in Oracle

I'm going to create a data model in oracle fusion applications. I need to create column End_date based on two tables in the query. So I used two methods.
Using a subquery:
SELECT *
FROM (SELECT projects_A.end_date
FROM projects_A, projects_B
WHERE projects_A.p_id = projects_B.p_id
AND rownum = 1)
Using a LEFT OUTER JOIN:
SELECT projects_A.end_date
FROM projects_A
LEFT JOIN projects_B
ON projects_A.p_id = projects_B.p_id
WHERE rownum = 1
Here when I used a subquery, the query returns the results as expected. But when I use left outer join with WHERE rownum = 1 the result is zero. Without WHERE rownum = 1 it retrieves all the results. But I want only the first result. So how can I do that using left outer join? Thank you.
Looks like you want to bring a non-null end_date value(So, add NULLS LAST), but the sorting order is not determined yet(you might add a DESC to the end of the ORDER BY clause depending on this fact ), and use FETCH clause(the DB version is 12c+ as understood from the comment) with ONLY option to exclude ties as yo want to bring only single row.
So, you can use the following query :
SELECT A.end_date
FROM projects_A A
LEFT JOIN projects_B B
ON A.p_id = B.p_id
ORDER BY end_date NULLS LAST
FETCH FIRST 1 ROW ONLY

Limit Query Result Using Count

I need to limit the results of my query so that it only pulls results where the total number of lines on the ID is less than 4, and am unsure how to do this without losing the select statement columns.
select fje.journalID, fjei.ItemID, fjei.acccount, fjei.debit, fjei.credit
from JournalEntry fje
inner join JournalEntryItem fjei on fjei.journalID = fje.journalID
inner join JournalEntryItem fjei2 on fjei.journalID = fjei2.journalID and
fjei.ItemID != fjei2.ItemID
order by fje.journalID
So if journalID 1 has 5 lines, it should be excluded, but if it has 4 lines, I should see it in my query. Just need a push in the right direction. Thanks!
A subquery with an alias has many names, but it's effectively a table. In your case, you would do something like this.
select your fields
from your tables
join (
select id, count(*) records
from wherever
group by id ) derivedTable on someTable.id = derivedTable.id
and records < 4

Optimizing aggregate function in Oracle

I have a query for pulling customer information, and I'm adding an max() function to find the most recent order date. Without the aggregate the query takes .23 seconds to run, but with it it takes 12.75 seconds.
Here's the query:
SELECT U.SEQ, MAX(O.ORDER_DATE) FROM CUST_MST U
INNER JOIN ORD_MST O ON U.SEQ = O.CUST_NUM
WHERE U.SEQ = :customerNumber
GROUP BY U.SEQ;
ORD_MST is a table with 890,000 records.
Is there a more efficient way to get this functionality?
EDIT: For the record, there's nothing specifically stopping me from running two queries and joining them in my program. I find it incredibly odd that such a simple query would take this long to run. In this case it is much cleaner/easier to let the database do the joining of information, but it's not the only way for me to get it done.
EDIT 2: As requested, here are the plans for the queries I reference in this question.
With Aggregate
Without Aggregate
the problem with your query is that you join both tables completely, then the max function is executed against the whole result, and at last the where statement filters your rows.
you have improve the join, by just joining the rows with the certain custid instead of the full tables, should look like this:
SELECT U.SEQ, MAX(O.ORDER_DATE) FROM
(SELECT * FROM CUST_MST WHERE SEQ = :customerNumber ) U
INNER JOIN
(SELECT * FROM ORD_MST WHERE CUST_NUM = :customerNumber) O ON U.SEQ = O.CUST_NUM
GROUP BY U.SEQ;
Another option is to use an order by and filter the first rownum. its not rly the clean way. Could be faster, if not you will also need a subselect to not order the full tables. Didnt use oracle for a while but it should look something like this:
SELECT * FROM
(
SELECT U.SEQ, O.ORDER_DATE FROM CUST_MST U
INNER JOIN ORD_MST O ON U.SEQ = O.CUST_NUM
WHERE U.SEQ = :customerNumber
GROUP BY U.SEQ;
ORDER BY O.ORDER_DATE DESC
)
WHERE ROWNUM = 1
Are you forced to use the join for some reason or why dont you select directly from ORD_MST without join?
EDIT
One more idea:
SELECT * FROM
(SELECT CUST_NUM, MAX(ORDER_DATE) FROM ORD_MST WHERE CUST_NUM = :customerNumber GROUP BY CUST_NUM) O
INNER JOIN CUST_MST U ON O.CUST_NUM = U.SEQ
if the inner select just takes one second, then the join should work instant.
Run this commands:
Explain plan for
SELECT U.SEQ, MAX(O.ORDER_DATE) FROM CUST_MST U
INNER JOIN ORD_MST O ON U.SEQ = O.CUST_NUM
WHERE U.SEQ = :customerNumber
GROUP BY U.SEQ;
select * from table( dbms_xplan.display );
and post results here.
Whithout knowing an execution plan we can only guess what really happens.
Btw. my feeling is that adding composite index for ORD_MST table with columns cust_num+order_date could solve the problem (assuming that SEQ is primary key for CUST_MST table and it has already an unique index). Try:
CREATE INDEX idx_name ON ORD_MST( cust_num, order_date );
Also, after creating the index refresh statistics with commands:
EXEC DBMS_STATS.gather_table_stats('your-schema-name', 'CUST_MST');
EXEC DBMS_STATS.gather_table_stats('your-schema-name', 'ORD_MST');
try your query.

How can you add 2 joins in a subquery?

I am trying to get information from 3 tables in my database. I am trying to get 4 fields. 'kioskid', 'kioskhours', 'videotime', 'sessiontime'. In order to do this, i am trying a join in a subquery. This is what I have so far:
SELECT k.kioskid, k.hours, v.time, s.time
FROM `nsixty_kiosks` as k
LEFT JOIN (SELECT time
FROM `nsixty_videos`
ORDER BY videoid) as v
ON kioskid = k.kioskid LEFT JOIN
(SELECT kioskid, time
FROM `sessions`
ORDER BY pingid desc LIMIT 1) as s ON s.kioskid = k.kioskid
WHERE hours is NOT NULL
When I run this query, it works but it shows every row instead of just showing the last row of each kiosk id. Which is meant to show based on the line 'ORDER BY pingid desc LIMIT 1'.
Any body have some ideas?
Instead of joining to s, you can use a correlated subquery:
SELECT k.kioskid,
k.hours,
v.time,
( SELECT time
FROM sessions
WHERE sessions.kioskid = k.kioskid
ORDER
BY pingid DESC
LIMIT 1
)
FROM nsixty_kiosks AS k
LEFT
JOIN ( SELECT time
FROM `nsixty_videos`
ORDER BY videoid
) AS v
ON kioskid = k.kioskid
WHERE hours IS NOT NULL
;
N.B. I didn't fix your LEFT JOIN (...) AS v, because I don't understand what it's trying to do, but it too is broken; the ON clause doesn't refer to any of its columns, and there's no point in having an ORDER BY in a subquery unless you also have a LIMIT or whatnot in there.
Well, your join on the 'v' subquery doesn't actually reference the 'v' subquery, nor does the 'v' subquery even contain a kioskid field to JOIN on, so that's undoubtedly part of the problem.
To go much further we'd need to see schema and sample data.

"Select Top 10, then Join Tables", instead of "Select Top 10 from Joined Tables"

I have inherited a stored procedure which performs joins across eight tables, some of which contain hundreds of thousands of rows, then selects the top ten entries from the result of that join.
I have enough information at the start of the procedure to select those ten rows from a single table, and then perform those joins on those ten rows, rather than on hundreds of thousands of intermediate rows.
How do I select those top ten rows and then only do joins on those ten rows, instead of performing joins all of the thousands of rows in the table?
I should try:
SELECT * FROM
(SELECT TOP 10 * FROM your_table
ORDER BY your_condition) p
INNER JOIN second_table t
ON p.field = t.field
The optimizer may not be able to perform the top 10 first if you have inner joins, since it can't be sure that the inner joins won't exclude rows later on. It would be a bug if it selected 10 rows from the main table, and then only returned 7 rows at the end because of a join. Using Marco's rewrite may gain you performance for this reason since you're expressly stating that it's safe to limit the rows before the joins.
If you're query is sufficiently complicated, the query plan optimizer may run out of time finding a good plan. It's only given a few hundred milliseconds, and with even a few joins there are probably thousands of different ways it can execute the query (different join orders, etc). If this is the case, you'll benefit from storing the first 10 rows in a temp table first, and then using that later like this:
select top 10 *
into #MainResults
from MyTable
order by your_condition;
select *
from #MainResults r
join othertable t
on t.whatever = r.whatever;
I've seen cases where this second approach has made a HUGE difference.
You can also use a CTE to define the top X and then use it
For example this data.se query limits only to top 40 tags
with top40 as (
select top 40 t.id, t.tagname
from tags t, posttags pt
where pt.tagid = t.id
group by t.tagname, t.id
order by count(pt.postid) desc
),
myanswers as(
select p.parentid, p.score
from posts p
where
p.owneruserid = ##UserID## and
p.communityowneddate is null
)
select t40.tagname as 'Tag', sum(p1.score) as 'Score',
case when sum(p1.score) >= 15 then ':-)' else ':-(' end as 'Status'
from top40 t40, myanswers p1, posttags pt1
where
pt1.postid = p1.parentid and
pt1.tagid = t40.id
group by t40.tagname
order by sum(p1.score) desc