This join works for what I'm trying to do but is very slow and seems a very inefficient option.
Basically the inner most query tells me when the appeal_audit.price has changed, the outer one shows me only values where it changes, and the outer most one only gives me the earliest one to stop duplication when it changed multiple times.
left join(
select * from (
Select
ROW_NUMBER() over (partition by UserId order by auditdate) as RowNo,
userid,
OldPrice,
auditdate
from (
select
UserId,
Price,
lag(appeal_audit.price) over (partition by appeal_audit.userid order by appeal_audit.auditdate) as OldPrice,
case when appeal_audit.Price <> lag(appeal_audit.price) over (partition by appeal_Audit.userid order by appeal_audit.auditdate) then 1 else 0 end as Flag,
AuditDate
from Appeal_Audit) t4
where t4.flag = 1) t5
where t5.RowNo = 1 ) t6
on t6.userid = userdetails.userid
Can someone help me out to tune this query? It's taking 1 minute time to return the data in sqldeveloper.
SELECT
masterid, notification_id, notification_list, typeid,
subject, created_at, created_by, approver, sequence_no,
productid, statusid, updated_by, updated_at, product_list,
notification_status, template, notification_type, classification
FROM
(
SELECT
masterid, notification_id, notification_list, typeid, subject,
approver, created_at, created_by, sequence_no, productid,
statusid, updated_by, updated_at, product_list, notification_status,
template, notification_type, classification,
ROW_NUMBER() OVER(ORDER BY masterid DESC)AS r
FROM
(
SELECT DISTINCT
a.masterid AS masterid,
a.maxid AS notification_id,
notification_list,
typeid,
noti.subject AS subject,
noti.approver AS approver,
noti.created_at AS created_at,
noti.created_by AS created_by,
noti.sequence_no AS sequence_no,
a.productid AS productid,
a.statusid AS statusid,
noti.updated_by AS updated_by,
noti.updated_at AS updated_at,
(
SELECT LISTAGG(p.name,',') WITHIN GROUP(ORDER BY p.id) AS list_noti
FROM product p
INNER JOIN notification_product np ON np.product_id = p.id
WHERE notification_id = a.maxid
) AS product_list,
(
SELECT description
FROM notification_status
WHERE id = a.statusid
) AS notification_status,
(
SELECT name
FROM template
WHERE id = a.templateid
) AS template,
(
SELECT description
FROM notification_type
WHERE id = a.typeid
) AS notification_type,
(
SELECT tc.description
FROM template_classification tc
INNER JOIN notification nt ON tc.id = nt.classification_id
WHERE nt.id = a.maxid
) AS classification
FROM
(
SELECT
nm.id AS masterid,
nm.product_id AS productid,
nm.notification_status_id AS statusid,
nm.template_id AS templateid,
nm.notification_type_id AS typeid,
(
SELECT MAX(id)
FROM notification
WHERE notification_master_id = nm.id
) AS maxid,
(
SELECT LISTAGG(n.id,',') WITHIN GROUP(ORDER BY nf.id) AS list_noti
FROM notification n
WHERE notification_master_id = nm.id
) AS notification_list
FROM notification_master nm
INNER JOIN notification nf ON nm.id = nf.notification_master_id
WHERE nm.disable = 'N'
ORDER BY nm.id DESC
) a
INNER JOIN notification noti
ON a.maxid = noti.id
AND
(
(
(
TO_DATE('01-jan-1970','dd-MM-YYYY') +
numtodsinterval(created_at / 1000,'SECOND')
) <
(current_date + INTERVAL '-21' DAY)
)
OR (typeid exists(2,4) AND statusid = 4)
)
)
)
WHERE r BETWEEN 11 AND 20
DISTINCT is very often an indicator for a badly written query. A normalized database doesn't contain duplicate data, so where do the duplicates suddenly come from that you must remove with DISTINCT? Very often it is your own query producing these. Avoid producing duplicates in the first place, so you don't need DISTINCT later.
In your case you are joining with the table notification in your subquery a, but you are not using its rows in that subquery; you only select from notification_master_id.
After all, you want to get notification masters, get their latest related notification (by getting its ID first and then select the row). You don't need hundreds of subqueries to achieve this.
Some side notes:
To get the description from template_classification you are joining again with the notification table, which is not necessary.
ORDER BY in a subquery (ORDER BY nm.id DESC) is superfluous, because subquery results are per standard SQL unsorted. (Oracle violates this standard sometimes in order to apply ROWNUM on the result, but you are not using ROWNUM in your query.)
It's a pity that you store created_at not as a DATE or TIMESTAMP, but as a number. This forces you to calculate. I don't think this has a great impact on your query, though, because you are using it in an OR condition.
CURRENT_DATE gets you the client date. This is rarely wanted, as you select data from the database, which should of course not relate to some client's date, but to its own date SYSDATE.
If I am not mistaken, your query can be shortened to:
SELECT
nm.id AS masterid,
nf.id AS notification_id,
nfagg.notification_list AS notification_list,
nm.notification_type_id AS typeid,
nf.subject AS subject,
nf.approver AS approver,
nf.created_at AS created_at,
nf.created_by AS created_by,
nf.sequence_no AS sequence_no,
nm.product_id AS productid,
nm.notification_status_id AS statusid,
nf.updated_by AS updated_by,
nf.updated_at AS updated_at,
(
SELECT LISTAGG(p.name, ',') WITHIN GROUP (ORDER BY p.id)
FROM product p
INNER JOIN notification_product np ON np.product_id = p.id
WHERE np.notification_id = nf.id
) AS product_list,
(
SELECT description
FROM notification_status
WHERE id = nm.notification_status_id
) AS notification_status,
(
SELECT name
FROM template
WHERE id = nm.template_id
) AS template,
(
SELECT description
FROM notification_type
WHERE id = nm.notification_type_id
) AS notification_type,
(
SELECT description
FROM template_classification
WHERE id = nf.classification_id
) AS classification
FROM notification_master nm
INNER JOIN
(
SELECT
notification_master_id,
MAX(id) AS maxid,
LISTAGG(id,',') WITHIN GROUP (ORDER BY id) AS notification_list
FROM notification
GROUP BY notification_master_id
) nfagg ON nfagg.notification_master_id = nm.id
INNER JOIN notification nf
ON nf.id = nfagg.maxid
AND
(
(
DATE '1970-01-01' + NUMTODSINTERVAL(nf.created_at / 1000, 'SECOND')
< CURRENT_DATE + INTERVAL '-21' DAY
)
OR (nm.notification_type_id IN (2,4) AND nm.notification_status_id = 4)
)
WHERE nm.disable = 'N'
ORDER BY nm.id DESC
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY;
As mentioned, you may want to replace CURRENT_DATE with SYSDATE.
I recommend the following indexes for the query:
CREATE INDEX idx1 ON notification_master (disable, id, notification_status_id, notification_type_id);
CREATE INDEX idx2 ON notification (notification_master_id, id, created_at);
A last remark on paging: In order to skip n rows to get the next n, the whole query must get executed for all data and then all result rows be sorted only to pick n of them at last. It is usually better to remember the last fetched ID and then only select rows with a higher ID in the next execution.
I've been trying to get the max(id) for the max(payment_date) of every account_id, as there are instances where there's different entries for the same max(payment_date). The ids are the payment references for the account_ids. So every account_id needs to have one entry with the max(payment_date) and the max(id) for that date. Problem is that there are entries where the max(id) for the account_id is not for the max(payment_date), or I would have just used max(id). The code below is not working because of this, since it will exclude entries where the max(id) is not for the max(payment_date). Thanks in advance.
select *
from (
select payments.*
from (
select account_id, max(payment_date) as last_payment, max(id) as last_payment1
from energy.payments
where state = 'success'
and amount_pennies > 0
and description not ilike '%credit%'
group by account_id
) as last_payment_table
inner join energy.payments as payments
on payments.account_id = last_payment_table.account_id
and payments.payment_date = last_payment_table.last_payment
and payments.id = last_payment_table.last_payment1
) as paymentst1
Use distinct on. I can't really follow your query (sample data is such a big help!) But the idea is:
select distinct on (p.account_id) p.*
from energy.payments p
order by p.account_id, p.payment_date desc, p.id desc;
You can add additional logic for filtering or whatever. That logic is not explained in your question but is suggested by the code you've included.
It is hard to understand the question, but I think you mean this:
SELECT *
FROM payments p
WHERE NOT EXISTS (
SELECT *
FROM payments nx
WHERE nx.account_id = p.account_id -- same account
AND nx.payment_date >= p.payment_date -- same or more recent date
AND nx.id > p.id -- higher ID
);
Or, using a window function:
select *
from (
select *
, row_number() OVER(PARTITION BY account_id
ORDER BY payment_date DESC,id DESC) as rn
from energy.payments
where state = 'success'
and amount_pennies > 0
and description not ilike '%credit%'
) x
WHERE x.rn=1
;
I have a subscription table and a payments table that I need to join.
I am trying to decide between 2 options and performance is a key consideration.
Which of the two OPTIONS below will perform better?
I am using Impala, and these tables are large (multiple millions of rows) I am needing to only get one row for every id and date grouping (hence the row_number() analytic function).
I have shortened the queries to illustrate my question:
OPTION 1:
WITH cte
AS (
SELECT *
, SUM(amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
),
payment
AS (
SELECT *
FROM cte
WHERE sameday_rownum = 1
)
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN payment ON s.id = p.id
OPTION 2:
WITH payment
AS (
SELECT *
, SUM(payment_amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
)
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN payment ON s.id = p.id
AND p.sameday_rownum = 1
An "Option 0" also exists. A far more traditional "derived table" which simply does not require use of any CTE.
SELECT s.*
, p.sameday_total
FROM subscription
INNER JOIN (
SELECT *
, SUM(payment_amount) OVER (PARTITION BY id, date)
AS sameday_total
, ROW_NUMBER() OVER (PARTITION BY id, date ORDER BY purchase_number DESC)
AS sameday_rownum
FROM payments
) p ON s.id = p.id
AND p.sameday_rownum = 1
All options 0,1 and 2 are likely to produce identical or very similar explain plans (although I'm more confident about that statement for SQL Server than Impala).
Adopting a CTE does - in itself - not make a query more efficient or better performing, so the syntax alteration between option 1 and 2 isn't major. I prefer option 0 myself as I prefer to use CTEs for specific tasks (e.g. recursion).
What you should do is use explain plans to study what each option produces.
This works perfectly until an order comes in where the stationID is not =2. My logic would be that sql searches where both conditions meet and display those results and not look where timeplaced is max then if stationid does not =2 display nothing which is what its doing.
SELECT OrderNo
FROM Orders
WHERE TimePlaced = (SELECT max(TimePlaced) FROM Orders)
AND StationID=2
Add your condition into the inner select too
SELECT OrderNo
FROM Orders
WHERE TimePlaced =
(
SELECT max(TimePlaced)
FROM Orders
WHERE StationID=2
)
AND StationID=2
But if you do not want to add the condition twice, then just "link" the inner select with the outer:
SELECT OrderNo
FROM Orders O
WHERE TimePlaced =
(
SELECT max(TimePlaced)
FROM Orders
WHERE StationID=O.StationID
)
AND StationID=2
Another way you could try would be with using a CTE and ROW_NUMBER:
;WITH OrdersCTE AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY StationID ORDER BY TimePlaced Desc) AS rn
FROM Orders
)
SELECT *
FROM OrdersCTE
WHERE rn = 1
AND StationID = 2