Distinct query causing the query to be slow - sql

I have a postgres query which is almost taking 7 seconds.
Joining two tables with where clause and when i use distinct it takes 7 seconds and without distinct i get the result in 500ms. I even applied index but of no help. How can i tune the query for better perfomance
select distinct RES.* from ACT_RU_TASK RES inner join ACT_RU_IDENTITYLINK I on
I.TASK_ID_ = RES.ID_ WHERE RES.ASSIGNEE_ is null
and I.TYPE_ = 'candidate'and ( I.GROUP_ID_ IN ( 'us1','us2') )
order by RES.priority_ desc LIMIT 10 OFFSET 0
For every RES.ID_ i have two I.TASK_ID_ so i need only unique records

Instead of using distinct use exists:
select RES.*
from ACT_RU_TASK RES
where exists (select 1
from ACT_RU_IDENTITYLINK I
where I.TASK_ID_ = RES.ID_ and
I.TYPE_ = 'candidate' and
I.GROUP_ID_ IN ( 'us1','us2')
) and
RES.ASSIGNEE_ is null
order by RES.priority_ desc
LIMIT 10 OFFSET 0;
For this query, you want an index on ACT_RU_IDENTITYLINK(TASK_ID_, TYPE_, GROUP_ID_). It is also possible that an index on ACT_RU_TASK(ASSIGNEE_, priority_, ID_) could be used.

Related

"ORA-00923: FROM keyword not found where expected\n what should I fix

I have an oracle query as follows but when I make changes to pagination the results are different. what should i pass for my code
SELECT *
FROM (
SELECT b.*,
ROWNUM r__
FROM (
select a.KODE_KLAIM,
a.NO_SS,
a.LA,
a.NAMA_TK,
a.KODE_K,
(
select tk.TEM_LAHIR
from KN.VW_KN_TK tk
where tk.KODE_K = a.KODE_K and rownum=1
) TEM_LAHIR,
(
select TO_CHAR(tk.TLAHIR, 'DD/MM/RRRR')
from KN.VW_KTK tk
where tk.KODE_K = a.KODE_K
and rownum=1
) TLAHIR
from PN.KLAIM a
where nvl(a.STATUS_BATAL,'X') = 'T'
and A.NOMOR IS NOT NULL
and A.TIPE_KLAIM = 'JPN01'
)b
)
where 1 = 1
WHERE ROWNUM < ( ( ? * ? ) + 1 )
WHERE r__ >= ( ( ( ? - 1 ) * ? ) + 1 )
but i run this query i have result ORA-00900: invalid SQL statement
You have three WHERE clauses at the end (and no ORDER BY clause). To make it syntactically valid you could change the second and third WHERE clauses to AND.
However, you mention pagination so what you probably want is to use:
SELECT *
FROM (
SELECT b.*,
ROWNUM r__
FROM (
select ...
from ...
ORDER BY something
)b
WHERE ROWNUM < :page_size * :page_number + 1
)
WHERE r__ >= ( :page_number - 1 ) * :page_size + 1
Note: You can replace the named bind variables with anonymous bind variables if you want.
Or, if you are using Oracle 12 or later then you can use the OFFSET x ROWS FETCH FIRST y ROWS ONLY syntax:
select ...
from ...
ORDER BY something
OFFSET (:page_number - 1) * :page_size ROWS
FETCH FIRST :page_size ROWS ONLY;
Additionally, you have several correlated sub-queries such as:
select tk.TEM_LAHIR
from KN.VW_KN_TK tk
where tk.KODE_K = a.KODE_K and rownum=1
This will find the first matching row that the SQL engine happens to read from the datafile and is effectively finding a random row. If you want a specific row then you need an ORDER BY clause and you need to filter using ROWNUM AFTER the ORDER BY clause has been applied.
From Oracle 12, the correlated sub-query would be:
select tk.TEM_LAHIR
from KN.VW_KN_TK tk
where tk.KODE_K = a.KODE_K
ORDER BY something
FETCH FIRST ROW ONLY

Offset and fetch in oracle sql developer

We have data in millions (total rows 1698393). While exporting this data in text takes 4 hours. I need to know if there is a way to reduce the exporting time for those many records from Oracle database using SQL Developer.
with cte as (
select *
from (
select distinct
system_serial_number,
( select s.system_status
from eim_pr_system s
where .system_serial_number=a.system_serial_number
) system_status,
( select SN.cmat_customer_id
from EIM.eim_pr_ib_latest SN
where SN.role_id=19
and SN.system_serial_number=a.system_serial_number
) SN_cmat_customer_id,
( select EC.cmat_customer_id
from EIM.eim_pr_ib_latest EC
where EC.role_id=1
and a.system_serial_number=EC.system_serial_number
) EC_cmat_customer_id
from EIM.eim_pr_ib_latest a
where a.role_id in (1,19)
)
where nvl(SN_cmat_customer_id,0)!=nvl(EC_cmat_customer_id,0)
)
select system_serial_number,
system_status,
SN_CMAT_Customer_ID,
EC_CMAT_Customer_ID,
C.Customer_Name SN_Customer_Name,
D.Customer_Name EC_Customer_Name
from cte,
eim.eim_party c,
eim.eim_party D
where c.CMAT_Customer_ID=SN_cmat_customer_id
and D.CMAT_Customer_ID=EC_cmat_customer_id;
offset first 5001 rows fetch next 200000 rows only
You can get rid of a lot of the joins and correlated sub-queries (which will speed things up by reducing the number of table scans) by doing something like:
SELECT a.system_serial_number,
s.system_status,
a.SN_cmat_customer_id,
a.EC_cmat_customer_id,
a.SN_customer_name,
a.EC_customer_name
FROM (
SELECT l.system_serial_number,
MAX( CASE l.role_id WHEN 19 THEN l.cmat_customer_id END ) AS SN_cmat_customer_id,
MAX( CASE l.role_id WHEN 1 THEN l.cmat_customer_id END ) AS EC_cmat_customer_id
MAX( CASE l.role_id WHEN 19 THEN p.customer_name END ) AS SN_customer_name,
MAX( CASE l.role_id WHEN 1 THEN p.customer_name END ) AS EC_customer_name
FROM EIM.eim_pr_ib_latest l
INNER JOIN
EIM.eim_aprty p
ON ( p.CMAT_Customer_ID= l.cmat_customer_id )
WHERE l.role_id IN ( 1, 19 )
GROUP BY system_serial_number
HAVING NVL( MAX( CASE l.role_id WHEN 19 THEN l.cmat_customer_id END ), 0 )
<> NVL( MAX( CASE l.role_id WHEN 1 THEN l.cmat_customer_id END ), 0 )
) a
LEFT OUTER JOIN
eim_pr_system s
ON ( s.system_serial_number=a.system_serial_number )
Since your original query is not throwing a TOO_MANY_ROWS exception on the correlated sub-queries, I am assuming that your data is such that there is only a single row being returned for each correlated query and the above query will reflect your output (although without some sample data it is difficult to test).
Apart from 'making the query faster' - there is a way to achieve a faster export using SQL Developer.
When you use the data grid, export feature - this will execute the query, again. The only time this won't happen is if you have fetched ALL the rows into the grid. Doing this will for very large data sets will be 'expensive' on the client side, but you can avoid that.
For a faster export, add a /*csv*/ comment in your select, and wrap the statement with a spool c:\my_file.csv - then collapse the script output panel, and run that with F5. As we fetch the data, we'll write it to that file in a CSV format.
/*csv*/
/*xml*/
/*json*/
/*html*/
/*insert*/
I talk about this feature in detail here.

ROW_NUMBER() Query Plan SORT Optimization

The query below accesses the Votes table that contains over 30 million rows. The result set is then selected from using WHERE n = 1. In the query plan, the SORT operation in the ROW_NUMBER() windowed function is 95% of the query's cost and it is taking over 6 minutes to complete execution.
I already have an index on same_voter, eid, country include vid, nid, sid, vote, time_stamp, new to cover the where clause.
Is the most efficient way to correct this to add an index on vid, nid, sid, new DESC, time_stamp DESC or is there an alternative to using the ROW_NUMBER() function for this to achieve the same results in a more efficient manner?
SELECT v.vid, v.nid, v.sid, v.vote, v.time_stamp, v.new, v.eid,
ROW_NUMBER() OVER (
PARTITION BY v.vid, v.nid, v.sid ORDER BY v.new DESC, v.time_stamp DESC) AS n
FROM dbo.Votes v
WHERE v.same_voter <> 1
AND v.eid <= #EId
AND v.eid > (#EId - 5)
AND v.country = #Country
One possible alternative to using ROW_NUMBER():
SELECT
V.vid,
V.nid,
V.sid,
V.vote,
V.time_stamp,
V.new,
V.eid
FROM
dbo.Votes V
LEFT OUTER JOIN dbo.Votes V2 ON
V2.vid = V.vid AND
V2.nid = V.nid AND
V2.sid = V.sid AND
V2.same_voter <> 1 AND
V2.eid <= #EId AND
V2.eid > (#EId - 5) AND
V2.country = #Country AND
(V2.new > V.new OR (V2.new = V.new AND V2.time_stamp > V.time_stamp))
WHERE
V.same_voter <> 1 AND
V.eid <= #EId AND
V.eid > (#EId - 5) AND
V.country = #Country AND
V2.vid IS NULL
The query basically says to get all rows matching your criteria, then join to any other rows that match the same criteria, but which would be ranked higher for the partition based on the new and time_stamp columns. If none are found then this must be the row that you want (it's ranked highest) and if none are found that means that V2.vid will be NULL. I'm assuming that vid otherwise can never be NULL. If it's a NULLable column in your table then you'll need to adjust that last line of the query.

PostgreSQL use case when result in where clause

I use complex CASE WHEN for selecting values. I would like to use this result in WHERE clause, but Postgres says column 'd' does not exists.
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t WHERE d IS NOT NULL
LIMIT 100, OFFSET 100;
Then I thought I can use it like this:
select * from (
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t
LIMIT 100, OFFSET 100) t
WHERE d IS NOT NULL;
But now I am not getting a 100 rows as result. Probably (I am not sure) I could use LIMIT and OFFSET outside select case statement (where WHERE statement is), but I think (I am not sure why) this would be a performance hit.
Case returns array or null. What is the best/fastest way to exclude some rows if result of case statement is null? I need 100 rows (or less if not exists - of course). I am using Postgres 9.4.
Edited:
SELECT count(*) OVER() AS count, t.id, t.size, t.price, t.location, t.user_id, p.city, t.price_type, ht.value as houses_type_value, ST_X(t.coordinates) as x, ST_Y(t.coordinates) AS y,
CASE WHEN t.classification='public' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.classification='protected' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.id IN (SELECT rl.table_id FROM table_private_list rl WHERE rl.owner_id=t.user_id AND rl.user_id=41026) THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
ELSE null
END AS main_image_description
FROM table t LEFT JOIN table_modes m ON m.id = t.mode_id
LEFT JOIN table_types y ON y.id = t.type_id
LEFT JOIN post_codes p ON p.id = t.post_code_id
LEFT JOIN table_houses_types ht on ht.id = t.houses_type_id
WHERE datetime_sold IS NULL AND datetime_deleted IS NULL AND t.published=true AND coordinates IS NOT NULL AND coordinates && ST_MakeEnvelope(17.831490030182, 44.404640972306, 12.151558389557, 47.837396630872) AND main_image_description IS NOT NULL
GROUP BY t.id, m.value, y.value, p.city, ht.value ORDER BY t.id LIMIT 100 OFFSET 0
To use the CASE WHEN result in the WHERE clause you need to wrap it up in a subquery like you did, or in a view.
SELECT * FROM (
SELECT id, name, CASE
WHEN name = 'foo' THEN true
WHEN name = 'bar' THEN false
ELSE NULL
END AS c
FROM case_in_where
) t WHERE c IS NOT NULL
With a table containing 1, 'foo', 2, 'bar', 3, 'baz' this will return records 1 & 2. I don't know how long this SQL Fiddle will persist, but here is an example: http://sqlfiddle.com/#!15/1d3b4/3 . Also see https://stackoverflow.com/a/7950920/101151
Your limit is returning less than 100 rows if those 100 rows starting at offset 100 contain records for which d evaluates to NULL. I don't know how to limit the subselect without including your limiting logic (your case statements) re-written to work inside the where clause.
WHERE ... AND (
t.classification='public' OR t.classification='protected'
OR t.id IN (SELECT rl.table_id ... rl.user_id=41026))
The way you write it will be different and it may be annoying to keep the CASE logic in sync with the WHERE limiting statements, but it would allow your limits to work only on matching data.

Fastest way to check if the the most recent result for a patient has a certain value

Mssql < 2005
I have a complex database with lots of tables, but for now only the patient table and the measurements table matter.
What I need is the number of patient where the most recent value of 'code' matches a certain value. Also, datemeasurement has to be after '2012-04-01'. I have fixed this in two different ways:
SELECT
COUNT(P.patid)
FROM T_Patients P
WHERE P.patid IN (SELECT patid
FROM T_Measurements M WHERE (M.code ='xxxx' AND result= 'xx')
AND datemeasurement =
(SELECT MAX(datemeasurement) FROM T_Measurements
WHERE datemeasurement > '2012-01-04' AND patid = M.patid
GROUP BY patid
GROUP by patid)
AND:
SELECT
COUNT(P.patid)
FROM T_Patient P
WHERE 1 = (SELECT TOP 1 case when result = 'xx' then 1 else 0 end
FROM T_Measurements M
WHERE (M.code ='xxxx') AND datemeasurement > '2012-01-04' AND patid = P.patid
ORDER by datemeasurement DESC
)
This works just fine, but it makes the query incredibly slow because it has to join the outer table on the subquery (if you know what I mean). The query takes 10 seconds without the most recent check, and 3 minutes with the most recent check.
I'm pretty sure this can be done a lot more efficient, so please enlighten me if you will :).
I tried implementing HAVING datemeasurment=MAX(datemeasurement) but that keeps throwing errors at me.
So my approach would be to write a query just getting all the last patient results since 01-04-2012, and then filtering that for your codes and results. So something like
select
count(1)
from
T_Measurements M
inner join (
SELECT PATID, MAX(datemeasurement) as lastMeasuredDate from
T_Measurements M
where datemeasurement > '01-04-2012'
group by patID
) lastMeasurements
on lastMeasurements.lastmeasuredDate = M.datemeasurement
and lastMeasurements.PatID = M.PatID
where
M.Code = 'Xxxx' and M.result = 'XX'
The fastest way may be to use row_number():
SELECT COUNT(m.patid)
from (select m.*,
ROW_NUMBER() over (partition by patid order by datemeasurement desc) as seqnum
FROM T_Measurements m
where datemeasurement > '2012-01-04'
) m
where seqnum = 1 and code = 'XXX' and result = 'xx'
Row_number() enumerates the records for each patient, so the most recent gets a value of 1. The result is just a selection.