How to use min() and max() in an efficient way? - sql

I have an sql query where I check if a value is between a max and a min value of a table. I've now implement this as follows:
SELECT spectrum_id, feature_table_id
FROM 'spectrum', 'feature'
WHERE `spectrum`.msrun_msrun_id = 1
AND `feature`.msrun_msrun_id = 1
AND (SELECT min(rt) FROM `convexhull` WHERE `convexhull`.feature_feature_table_id = `feature`.feature_table_id) <= scan_start_time
AND scan_start_time <= (SELECT max(rt) FROM `convexhull` WHERE 'convexhull'.feature_feature_table_id = 'feature'.feature_table_id)
AND (SELECT min(mz) FROM `convexhull` WHERE `convexhull`.feature_feature_table_id = `feature`.feature_table_id) <= base_peak_mz
AND base_peak_mz <= (SELECT max(mz) FROM `convexhull` WHERE `convexhull`.feature_feature_table_id = `feature`feature_table_id)
This is running very slowly, because I'm selecting from convexhull 4 times every time I run this query, so I tried to improve it using an inner join:
SELECT spectrum_id, feature_table_id
FROM 'spectrum', 'feature'
INNER JOIN `convexhull` ON `convexhull`.feature_feature_table_id = `feature`.feature_table_id
WHERE `spectrum`.msrun_msrun_id = ? "+
AND `feature`.msrun_msrun_id = ? "+
AND min(`convexhull`.rt) <= scan_start_time "+
AND scan_start_time <= max(`convexhull`.rt) "+
AND min(`convexhull`.mz) <= base_peak_mz "+
AND base_peak_mz <= max(`convexhull`.mz)", spectrumFeature_InputValues)
However, the min() and max() statements can only be used after a select statement. How can I make the first query more efficient, so that I can get the min and max rt and mz without having to do 4 queries?

EDIT: had a few more mins and looked again and realised all the data comes from that one table so something like this should work
SELECT
spectrum_id
,feature_table_id
FROM
spectrum AS s
INNER JOIN feature AS f
on f.msrun_msrun_id = s.msrun_msrun_id
INNER JOIN (select
feature_feature_table_id
,min(rt) AS rtMin
,max(rt) AS rtMax
,min(mz) AS mzMin
,max(mz) as mzMax
FROM
convexhull
GROUP BY
feature_feature_table_id
) AS t
ON t.feature_feature_table_id = f.feature_table_id
WHERE
s.msrun_msrun_id = 1
AND s.scan_start_time >= t.rtMin
AND s.scan_start_time <= t.rtMax
AND base_peak_mz >= t.mxMin
AND base_peak_mz <= t.mzMax
I think you want to select from the convexhull table and group by feature_feature_table_id getting the min and max rt within that grouping.
you can then wrap that select in brackets give it a name (as t) and join to it.
Hope this is enought to get you on the road.. if not create a sample schema here: http://sqlfiddle.com/
and put in your query and i can modify it.
as a side note, I think you wan to join these tables on a particular field rather than select from both with a where clause compare:
SELECT spectrum_id, feature_table_id
FROM 'spectrum', 'feature'
WHERE `spectrum`.msrun_msrun_id = 1
AND `feature`.msrun_msrun_id = 1
and:
SELECT
spectrum_id
,feature_table_id
FROM
spectrum AS s
INNER JOIN feature AS f
on f.msrun_msrun_id = s.msrun_msrun_id
WHERE
s.msrun_msrun_id = 1
If i have got something wrong there let me know.

Related

Optimizing SQL Query speed

I am trying to optimize my SQL query below as I am using a very old RDMS called firebird. I tried rearranging the items in my where clause and removing the order by statement but the query still seems to take forever to run. Unfortunately firebird doesn't support Explain Execution Plan Functionalities and therefore I cannot identify the code that is holding up the query.
select T.veh_reg_no,T.CON_NO, sum(T.pos_gpsunlock) as SUM_GPS_UNLOCK,
count(T.pos_gpsunlock) as SUM_REPORTS, contract.con_name
from
(
select veh_reg_no,CON_NO,
case when pos_gpsunlock = upper('T') then 1 else 0 end as pos_gpsunlock
from vehpos
where veh_reg_no in
( select regno
from fleetvehicle
where fleetno in (97)
) --DS5
and pos_timestamp > '2022-07-01'
and pos_timestamp < '2022-08-01'
) T
join contract on T.con_no = contract.con_no
group by T.veh_reg_no, T.con_no,contract.con_name
order by SUM_GPS_UNLOCK desc;
If anyone can help it would be greatly appreciated.
I'd either comment out some of the sub-queries or remove a join or aggregation and see if that improves it. Once you find the offending code maybe you can move it or re-write it. I know nothing of Firebird but I'd approach that query with the below code, wrapping the aggregation outside of the joins and removing the "Where in" clause.
If nothing works can you create an aggregation table or pre-filtered table and use that?
select
x.*
,sum(case when x.pos_gpsunlock = upper('T') then 1 else 0 end) as SUM_GPS_UNLOCK
,count(*) as SUM_REPORTS
FROM (
select
a.veh_reg_no
,a.pos_gpsunlock
,a.CON_NO
,c.con_name
FROM vehpos a
JOIN fleetvehicle b on a.veg_reg_no = b.reg_no and b.fleetno = 97 and b.pos_timestamp between '222-07-01' and '2022-08-01'
JOIN contract c on a.con_no = contract.con_no
) x
Group By....
This might help by converting subqueries to joins and reducing nesting. Also an = instead of IN() operation.
select vp.veh_reg_no,vp.con_no,c.con_name,
count(*) as SUM_REPORTS,
sum(case when pos_gpsunlock = upper('T') then 1 else 0 end) as SUM_GPS_UNLOCK
from vehpos vp
inner join fleetvehicle fv on fv.fleetno = 97 and fv.regno = vp.veh_reg_no
inner join contract c on vp.con_no = c.con_no
where vp.pos_timestamp >= '2022-07-01'
and vp.pos_timestamp < '2022-08-01'
group by vp.veh_reg_no, vp.con_no, c.con_name

Display only result > 2

I face a problem with the result on my script.
My formula for MARGIN is ((plnamt-(ibhexc/ibhand))/plnamt)*100.
I want to display only result > 2. How to do this? Please help.
This my script:
select
a.plnitm,a.plnstr,max(a.plncdt),max(a.plnndt),max(a.plnamt),max(a.plnevt),b.idept,c.ibhand,c.ibhexc,
decimal((c.ibhexc/c.ibhand),12,4) as AVG_COST,
decimal(((((max(a.plnamt)-(c.ibhexc/c.ibhand))/(max(a.plnamt))))*100),12,4) as MARGIN
from prcpln a
inner join invmst b on a.plnitm = b.inumbr
inner join invbal c on a.plnitm = c.inumbr and a.plnstr = c.istore and c.ibhand <> 0
where a.plnstr = ''14006''
group by a.plnitm,a.plnstr,b.idept,c.ibhand,c.ibhexc
order by a.plnitm
Simplistically, for situations like this, you can take your query and put it inside a cte:
WITH q AS (
select
a.plnitm,a.plnstr,max(a.plncdt),max(a.plnndt),max(a.plnamt),max(a.plnevt),b.idept,c.ibhand,c.ibhexc,
decimal((c.ibhexc/c.ibhand),12,4) as AVG_COST,
decimal(((((max(a.plnamt)-(c.ibhexc/c.ibhand))/(max(a.plnamt))))*100),12,4) as MARGIN
from prcpln a
inner join invmst b on a.plnitm = b.inumbr
inner join invbal c on a.plnitm = c.inumbr and a.plnstr = c.istore and c.ibhand <> 0
where a.plnstr = ''14006''
group by a.plnitm,a.plnstr,b.idept,c.ibhand,c.ibhexc
)
SELECT * FROM q WHERE margin > 2
ORDER BY q.plnitm
This is similar to the advice to use HAVING - a HAVING is a "where clause that is done after a GROUP BY"
A cte is a way of taking some calculated block of data and giving it an alias that can be used just like a table. I wanted to answer this way to point out to you that queries don't have to be formed purely from tables; tables are just blocks of data (with a name), and the output from a select is also "just a block of data" that can be given a name (by use of a cte or subquery) and then used just like a table is

where statement execute before inner join

I'm trying to grab the first instance of each result with a sysAddress of less than 4. However my statement currently grabs the min(actionTime) result first before applying the where sysAddress < 4. I'm trying to have the input for the inner join as the where sysAddress < 4 however i cant seem to figure out how to do it.
Should i be nesting it all differently? I didnt want to create an additional layer of table joins. Is this possible? I'm a bit lost at all the answers ive found.
SELECT
tblHistoryObject.info,
tblHistory.actionTime,
tblHistoryUser.userID,
tblHistoryUser.firstName,
tblHistoryUser.surname,
tblHistory.eventID,
tblHistoryObject.objectID,
tblHistorySystem.sysAddress
FROM tblHistoryObject
JOIN tblHistory
ON (tblHistory.historyObjectID = tblHistoryObject.historyObjectID)
JOIN tblHistorySystem
ON (tblHistory.historySystemID = tblHistorySystem.historySystemID)
JOIN tblHistoryUser
ON (tblHistory.historyUserID = tblHistoryUser.historyUserID)
INNER JOIN (SELECT
MIN(actionTime) AS recent_date,
historyObjectID
FROM tblHistory
GROUP BY historyObjectID) AS t2
ON t2.historyObjectID = tblHistoryObject.historyObjectID
AND tblHistory.actionTime = t2.recent_date
WHERE sysAddress < 4
ORDER BY actionTime ASC
WITH
all_action_times AS
(
SELECT
tblHistoryObject.info,
tblHistory.actionTime,
tblHistoryUser.userID,
tblHistoryUser.firstName,
tblHistoryUser.surname,
tblHistory.eventID,
tblHistoryObject.objectID,
tblHistorySystem.sysAddress,
ROW_NUMBER() OVER (PARTITION BY tblHistoryObject.historyObjectID
ORDER BY tblHistory.actionTime
)
AS historyObjectID_SeqByActionTime
FROM
tblHistoryObject
INNER JOIN
tblHistory
ON tblHistory.historyObjectID = tblHistoryObject.historyObjectID
INNER JOIN
tblHistorySystem
ON tblHistory.historySystemID = tblHistorySystem.historySystemID
INNER JOIN
tblHistoryUser
ON tblHistory.historyUserID = tblHistoryUser.historyUserID
WHERE
tblHistorySystem.sysAddress < 4
)
SELECT
*
FROM
all_action_times
WHERE
historyObjectID_SeqByActionTime = 1
ORDER BY
actionTime ASC
This does exactly what your original query did, without trying to filter by action_time.
Then it appends a new column, using ROW_NUMBER() to generate sequences from 1 for each individual tblHistoryObject.historyObjectID. Then it takes only the rows where this sequence value is 1 (the first row per historyObjectID, when sorted in action_time order).

Should a subquery on a join use tables from an outer query in the where clause?

I need to add a subquery to a join, because one payment can have more than one allotment, so I only need to account for the first match (where rownum = 1).
However, I'm not sure if adding pmt from the outer query to the subquery on the allotment join is best.
Should I be doing this differently in the event of performance hits, etc.. ?
SELECT
pmt.payment_uid,
alt.allotment_uid,
FROM
payment pmt
/* HERE: is the reference to pmt.pay_key and pmt.client_id
incorrect in the below subquery? */
INNER JOIN allotment alc ON alt.allotment_uid = (
SELECT
allotment_uid
FROM
allotment
WHERE
pay_key = pmt.pay_key
AND
pay_code = 'xyz'
AND
deleted = 'N'
AND
client_id = pmt.client_id
AND
ROWNUM = 1
)
WHERE
AND
pmt.deleted = 'N'
AND
pmt.date_paid >= TO_DATE('2017-07-01')
AND
pmt.date_paid < TO_DATE('2017-10-01') + 1;
It's difficult to identify the performance issue in your query without seeing an explain plan output. You query does seem to do an additional SELECT on the allotment for every record from the main query.
Here is a version which doesn't use correlated sub query. Obviously I haven't been able to test it. It does a simple join in and then filters all records except one of the allotments. Hope this helps.
WITH v_payment
AS
(
SELECT
pmt.payment_uid,
alt.allotment_uid,
ROW_NUMBER () OVER(PARTITION BY allotment_id) r_num
FROM
payment pmt JOIN allotment alt
ON (pmt.pay_key = alt.pay_key AND
pmt.client_id = alt.client_id)
WHERE pmt.deleted = 'N' AND
pmt.date_paid >= TO_DATE('2017-07-01') AND
pmt.date_paid < TO_DATE('2017-10-01') + 1 AND
alt.pay_code = 'xyz' AND
alt.deleted = 'N'
)
SELECT payment_uid,
allotment_uid
FROM v_payment
WHERE r_num = 1;
Let's know how this performs!
You can phrase the query that way. I would be more likely to do:
SELECT . . .
FROM payment p INNER JOIN
(SELECT a.*,
ROW_NUMBER() OVER (PARTITION BY pay_key, client_id
ORDER BY allotment_uid
) as seqnum
FROM allotment a
WHERE pay_code = 'xyz' AND deleted = 'N'
) a
ON a.pay_key = p.pay_key AND a.client_id = p.client_id AND
seqnum = 1
WHERE p.deleted = 'N' AND
p.date_paid >= DATE '2017-07-01' AND
p.date_paid < (DATE '2017-10-01') + 1;

Combine two SQL Queries to prevent having to use a loop

I need to combine two SQL Queries and I'm hurting myself trying to think through it. My first query gets the number of visitors per day and the second query gets the number of unique visitors per day.
Query 1 - For getting the number of visits
SELECT Count(server_instances.game_id) AS visit_count,
refined_player_visits.visit_date AS visit_date
FROM work.refined_player_visits
INNER JOIN tapi.server_instances
ON server_instances.server_id = refined_player_visits.server_id
WHERE ( server_instances.game_id = "31" )
GROUP BY visit_date;
Query 2 - For getting the number of unique visits
SELECT Count(visit_counts.unique_visit_date) AS unique_visits
FROM (SELECT Count(refined_player_visits.server_id) AS visit_count,
refined_player_visits.visit_date AS unique_visit_date
FROM refined_player_visits
INNER JOIN server_instances
ON server_instances.server_id =
refined_player_visits.server_id
WHERE ( server_instances.place_id = "31"
AND refined_player_visits.visit_date <= CURRENT_VISIT_DATE )
GROUP BY refined_player_visits.roblox_id) AS visit_counts
WHERE ( visit_counts.visit_date = CURRENT_VISIT_DATE
AND visit_counts.visit_count = 1 )
Because this was originally for a web application, I got the results back from the first query and looped through each one. During each loop I would do the second query (where CURRENT_VISIT_DATE is actually the visit_date from the first query.
I'd like to turn this into one query using a JOIN, perhaps. I'm migrating to another system and I don't have the option of doing a second query in the loop statement, so I want to just combine the two queries. I can't seem to wrap my head around it, though.
Does this return what you want?
SELECT *
FROM
(
SELECT Count(server_instances.game_id) AS visit_count,
refined_player_visits.visit_date AS visit_date
FROM work.refined_player_visits
INNER JOIN tapi.server_instances
ON server_instances.server_id = refined_player_visits.server_id
WHERE ( server_instances.game_id = "31" )
GROUP BY visit_date
) AS FirstQuery,
(
SELECT Count(visit_counts.unique_visit_date) AS unique_visits
FROM (SELECT Count(refined_player_visits.server_id) AS visit_count,
refined_player_visits.visit_date AS unique_visit_date
FROM refined_player_visits
INNER JOIN server_instances
ON server_instances.server_id =
refined_player_visits.server_id
WHERE (server_instances.place_id = "31"
AND refined_player_visits.visit_date <= FirstQuery.visit_date)
GROUP BY refined_player_visits.roblox_id) AS visit_counts
WHERE ( visit_counts.visit_date = FirstQuery.visit_date
AND visit_counts.visit_count = 1 )
) AS SecondQuery