Most efficient way to perform an SQL-Statement containing a max-aggregate in the sub-query - sql

I have a currency table and an exchange_rate_log table. The latter contains over a billion of records. The exchange_rate_log table contains the traded exchange rates for many currencies for the last couple of years.
Now I have to select for all available currencies (in the table currency) the latest valid traded exchange rate for a given exchange_currency and a given date.
So if the given exchange_currency would be "EUR" and the date would be yesterday. The result would return the latest trades of all available currencies into "EUR" in the time window from the first available entries in the table "exchange_rate_log" until yesterday.
The following query shows a possible way to get the answer. However the given query does not perform very well.
SELECT cur.name, log.price, log.valid_at
FROM currency cur
JOIN exchange_rate_log log ON (cur.id = log.currency_id)
WHERE log.valid_at = (SELECT max(log2.valid_at)
FROM exchange_rate_log log2
WHERE log2.currency_id = cur.id
AND log2.exchange_currency = ?
AND log2.valid_at < ?);
Is there a possibility to get the same result with an adapted query which would perform better? Is it possible to create an index to boost the performance of the above query?
Remark: The target dbms is Oracle.

You didn't tag which DBMS you're using, so this is using RANK from Standard SQL:
select *
from
(
SELECT cur.name, log.price, log.valid_at,
RANK()
OVER (PARTITION BY cur.id
order by valid_at DESC) as rnk
FROM currency cur
JOIN exchange_rate_log log on (cur.id = log.currency_id)
WHERE log.exchange_currency = ?
AND log.valid_at < ?
) dt
where rnk = 1;
If this is more efficient depends on the optimizing capabilities of the DBMS.
Otherwise adding a log.valid_at < ? condition to your original query might help.

SELECT TOP 1
cur.name, log.price, log.valid_at
FROM exchange_rate_log log
INNER JOIN currency cur
ON log.currency_id = cur.id
WHERE log.exchange_currency = ?
AND log.valid_at < ?
ORDER BY log.valid DESC;
It is also very important to have an index on log.valid and log.exchange_currency or else nothing is going to make your query fast.
I also think that the performance of the query presented will be similar but I think that it is slightly simplified.

Related

Select only the row with the max value, but the column with this info is a SUM()

I have the following query:
SELECT DISTINCT
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI,
SUM(VLRNOTA) AS AMOUNT
FROM TGFCAB CAB, TGFPAR PAR, TSIBAI BAI
WHERE CAB.CODPARC = PAR.CODPARC
AND PAR.CODBAI = BAI.CODBAI
AND CAB.TIPMOV = 'V'
AND STATUSNOTA = 'L'
AND PAR.CODCID = 5358
GROUP BY
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI
Which the result is this. Company names and neighborhood hid for obvious reasons
The query at the moment, for those who don't understand Latin languages, is giving me clients, company name, company neighborhood, and the total value of movements.
in the WHERE clause it is only filtering sales movements of companies from an established city.
But if you notice in the Select statement, the column that is retuning the value that aggregates the total amount of value of sales is a SUM().
My goal is to return only the company that have the maximum value of this column, if its a tie, display both of em.
This is where i'm struggling, cause i can't seem to find a simple solution. I tried to use
WHERE AMOUNT = MAX(AMOUNT)
But as expected it didn't work
You tagged the question with the whole bunch of different databases; do you really use all of them?
Because, "PL/SQL" reads as "Oracle". If that's so, here's one option.
with temp as
-- this is your current query
(select columns,
sum(vrlnota) as amount
from ...
where ...
)
-- query that returns what you asked for
select *
from temp t
where t.amount = (select max(a.amount)
from temp a
);
You should be able to achieve the same without the need for a subquery using window over() function,
WITH T AS (
SELECT
CAB.CODPARC,
PAR.RAZAOSOCIAL,
BAI.NOMEBAI,
SUM(VLRNOTA) AS AMOUNT,
MAX(VLRNOTA) over() AS MAMOUNT
FROM TGFCAB CAB
JOIN TGFPAR PAR ON PAR.CODPARC = CAB.CODPARC
JOIN TSIBAI BAI ON BAI.CODBAI = PAR.CODBAI
WHERE CAB.TIPMOV = 'V'
AND STATUSNOTA = 'L'
AND PAR.CODCID = 5358
GROUP BY CAB.CODPARC, PAR.RAZAOSOCIAL, BAI.NOMEBAI
)
SELECT CODPARC, RAZAOSOCIAL, NOMEBAI, AMOUNT
FROM T
WHERE AMOUNT=MAMOUNT
Note it's usually (always) beneficial to join tables using clear explicit join syntax. This should be fine cross-platform between Oracle & SQL Server.

SQL Math Operation In Correlated Subquery

I am working with three tables, basically, one is a bill of materials, one contains part inventory, and the last one contains work orders or jobs. I am trying to find out if it is possible to have a correlated subquery that can perform a math operation using a value from the outer query. Here's an example of what I'm trying to do:
SELECT A.work_order,A.assembly,A.job_quantity,
(SELECT COUNT(X.part_number)
FROM bom X
WHERE X.assembly = A.assembly
AND (X.quantity_required * A.job_quantity) >= (SELECT Y.quantity_available FROM inventory Y WHERE
Y.part_number = X.part_number)) AS negatives
FROM work_orders A
ORDER BY A.assembly ASC
I am attempting to find out, for a given work order, if there are parts that we do not have enough of to build the assembly. I'm currently getting an "Error correlating fields" error. Is it possible to do this kind of operation in a single query?
Try moving the subquery to a join, something like this:
SELECT a.work_order, a.assembly, a.job_quantity, n.negatives
FROM work_orders a JOIN (SELECT x.part_number, COUNT(x.part_number) as negatives
FROM bom x JOIN work_orders b
ON x.assembly = b.assembly
WHERE (x.quantity_required * b.job_quantity) >= (SELECT y.quantity_available
FROM inventory y WHERE
y.part_number = x.part_number)
GROUP BY x.part_number) n
ON a.part_number = n.part_number
ORDER BY a.assembly ASC
Or create a temporary cursor with the subquery and then use it to join the main table.
Hope this helps.
Luis

SQL Filtering duplicate rows due to bad ETL

The database is Postgres but any SQL logic should help.
I am retrieving the set of sales quotations that contain a given product within the bill of materials. I'm doing that in two steps: step 1, retrieve all DISTINCT quote numbers which contain a given product (by product number).
The second step, retrieve the full quote, with all products listed for each unique quote number.
So far, so good. Now the tough bit. Some rows are duplicates, some are not. Those that are duplicates (quote number & quote version & line number) might or might not have maintenance on them. I want to pick the row that has maintenance greater than 0. The duplicate rows I want to exclude are those that have a 0 maintenance. The problem is that some rows, which have no duplicates, have 0 maintenance, so I can't just filter on maintenance.
To make this exciting, the database holds quotes over 20+ years. And the data scientists guys have just admitted that maybe the ETL process has some bugs...
--- step 0
--- cleanup the workspace
SET CLIENT_ENCODING TO 'UTF8';
DROP TABLE IF EXISTS product_quotes;
--- step 1
--- get list of Product Quotes
CREATE TEMPORARY TABLE product_quotes AS (
SELECT DISTINCT master_quote_number
FROM w_quote_line_d
WHERE item_number IN ( << model numbers >> )
);
--- step 2
--- Now join on that list
SELECT
d.quote_line_number,
d.item_number,
d.item_description,
d.item_quantity,
d.unit_of_measure,
f.ref_list_price_amount,
f.quote_amount_entered,
f.negtd_discount,
--- need to calculate discount rate based on list price and negtd discount (%)
CASE
WHEN ref_list_price_amount > 0
THEN 100 - (ref_list_price_amount + negtd_discount) / ref_list_price_amount *100
ELSE 0
END AS discount_percent,
f.warranty_months,
f.master_quote_number,
f.quote_version_number,
f.maintenance_months,
f.territory_wid,
f.district_wid,
f.sales_rep_wid,
f.sales_organization_wid,
f.install_at_customer_wid,
f.ship_to_customer_wid,
f.bill_to_customer_wid,
f.sold_to_customer_wid,
d.net_value,
d.deal_score,
f.transaction_date,
f.reporting_date
FROM w_quote_line_d d
INNER JOIN product_quotes pq ON (pq.master_quote_number = d.master_quote_number)
INNER JOIN w_quote_f f ON
(f.quote_line_number = d.quote_line_number
AND f.master_quote_number = d.master_quote_number
AND f.quote_version_number = d.quote_version_number)
WHERE d.net_value >= 0 AND item_quantity > 0
ORDER BY f.master_quote_number, f.quote_version_number, d.quote_line_number
The logic to filter the duplicate rows is like this:
For each master_quote_number / version_number pair, check to see if there are duplicate line numbers. If so, pick the one with maintenance > 0.
Even in a CASE statement, I'm not sure how to write that.
Thoughts? The database is Postgres but any SQL logic should help.
I think you will want to use Window Functions. They are, in a word, awesome.
Here is a query that would "dedupe" based on your criteria:
select *
from (
select
* -- simplifying here to show the important parts
,row_number() over (
partition by master_quote_number, version_number
order by maintenance desc) as seqnum
from w_quote_line_d d
inner join product_quotes pq
on (pq.master_quote_number = d.master_quote_number)
inner join w_quote_f f
on (f.quote_line_number = d.quote_line_number
and f.master_quote_number = d.master_quote_number
and f.quote_version_number = d.quote_version_number)
) x
where seqnum = 1
The use of row_number() and the chosen partition by and order by criteria guarantee that only ONE row for each combination of quote_number/version_number will get the value of 1, and it will be the one with the highest value in maintenance (if your colleagues are right, there would only be one with a value > 0 anyway).
Can you do something like...
select
*
from
w_quote_line_d d
inner join
(
select
...
,max(maintenance)
from
w_quote_line_d
group by
...
) d1
on
d1.id = d.id
and d1.maintenance = d.maintenance;
Am I understanding your problem correctly?
Edit: Forgot the group by!
I'm not sure, but maybe you could Group By all other columns and use MAX(Maintenance) to get only the greatest.
What do you think?

Complicated Calculation Using Oracle SQL

I have created a database for an imaginary solicitors, my last query to complete is driving me insane. I need to work out the total a solicitor has made in their career with the company, I have time_spent and rate to multiply and special rate to add. (special rate is a one off charge for corporate contracts so not many cases have them). the best I could come up with is the code below. It does what I want but only displays the solicitors working on a case with a special rate applied to it.
I essentially want it to display the result of the query in a table even if the special rate is NULL.
I have ordered the table to show the highest amount first so i can use ROWNUM to only show the top 10% earners.
CREATE VIEW rich_solicitors AS
SELECT notes.time_spent * rate.rate_amnt + special_rate.s_rate_amnt AS solicitor_made,
notes.case_id
FROM notes,
rate,
solicitor_rate,
solicitor,
case,
contract,
special_rate
WHERE notes.solicitor_id = solicitor.solicitor_id
AND solicitor.solicitor_id = solicitor_rate.solicitor_id
AND solicitor_rate.rate_id = rate.rate_id
AND notes.case_id = case.case_id
AND case.contract_id = contract.contract_id
AND contract.contract_id = special_rate.contract_id
ORDER BY -solicitor_made;
Query:
SELECT *
FROM rich_solicitors
WHERE ROWNUM <= (SELECT COUNT(*)/10
FROM rich_solicitors)
I'm suspicious of your use of ROWNUM in your example query...
Oracle9i+ supports analytic functions, like ROW_NUMBER and NTILE, to make queries like your example easier. Analytics are also ANSI, so the syntax is consistent when implemented (IE: Not on MySQL or SQLite). I re-wrote your query as:
SELECT x.*
FROM (SELECT n.time_spent * r.rate_amnt + COALESCE(spr.s_rate_amnt, 0) AS solicitor_made,
n.case_id,
NTILE(10) OVER (ORDER BY solicitor_made) AS rank
FROM NOTES n
JOIN SOLICITOR s ON s.solicitor_id = n.solicitor_id
JOIN SOLICITOR_RATE sr ON sr.solicitor_id = s.solicitor_id
JOIN RATE r ON r.rate_id = sr.rate_id
JOIN CASE c ON c.case_id = n.case_id
JOIN CONTRACT cntrct ON cntrct.contract_id = c.contract_id
LEFT JOIN SPECIAL_RATE spr ON spr.contract_id = cntrct.contract_id) x
WHERE x.rank = 1
If you're new to SQL, I recommend using ANSI-92 syntax. Your example uses ANSI-89, which doesn't support OUTER JOINs and is considered deprecated. I used a LEFT OUTER JOIN against the SPECIAL_RATE table because not all jobs are likely to have a special rate attached to them.
It's also not recommended to include an ORDER BY in views, because views encapsulate the query -- no one will know what the default ordering is, and will likely include their own (waste of resources potentially).
you need to left join in the special rate.
If I recall the oracle syntax is like:
AND contract.contract_id = special_rate.contract_id (+)
but now special_rate.* can be null so:
+ special_rate.s_rate_amnt
will need to be:
+ coalesce(special_rate.s_rate_amnt,0)

Sql query has unwanted results - Where is the problem?

I have a table which records users's scores at a game (a user may submit 5,10,20,as many scores as he wants).
I need to show the 20 top scores of a game, but per user. (as a user may have submitted eg 4 scores which are the top according to other users's scores)
The query i have written is:
SELECT DISTINCT
`table_highscores`.`userkey`,
max(`table_highscores`.`score`),
`table_users`.`username`,
`table_highscores`.`dateachieved`
FROM
`table_highscores`, `table_users`
WHERE
`table_highscores`.`userkey` = `table_users`.`userkey`
AND
`table_highscores`.`gamekey` = $gamekey
GROUP BY
`userkey`
ORDER BY
max(`table_highscores`.`score`) DESC,
LIMIT 0, 20;
The output result is ok, but there is a problem. When i calculate the difference of days (today-this of dateachieved) the result is wrong. (eg instead of saying "the score was submitted 22 days ago, it says 43 days ago) So,I have to do a second query for each score so to find the true date (meaning +20 queries). Is there any shorter way to find the correct date?
Thanks.
there is a problem. When i calculate the difference of days (today-this of dateachieved) the result is wrong.
There's two issues
the dateachieved isn't likely to be the value associated with the high score
you can use MySQL's DATEDIFF to return the the number of days between the current date and the dateachieved value.
Use:
SELECT u.username,
hs.userkey,
hs.score,
DATEDIFF(NOW(), hs.dateachieved)
FROM TABLE_HIGHSCORES hs
JOIN TABLE_USERNAME u ON u.userkey = hs.userkey
JOIN (SELECT ths.userkey,
ths.gamekey,
ths.max_score,
MAX(ths.date_achieved) 'max_date'
FROM TABLE_HIGHSCORES ths
JOIN (SELECT t.userkey,
t.gamekey,
MAX(t.score) 'max_score'
FROM TABLE_HIGHSCORES t
GROUP BY t.userkey, t.gamekey) ms ON ms.userkey = ths.userkey
AND ms.gamekey = ths.gamekey
AND ms.max_score = ths.score
) x ON x.userkey = hs.userkey
AND x.gamekey = hs.gamekey
AND x.max_score = hs.score
AND x.max_date = hs.dateachieved
WHERE hs.gamekey = $gamekey
ORDER BY hs.score DESC
LIMIT 20
I also changed your query to use ANSI-92 JOIN syntax, from ANSI-89 syntax. It's equivalent performance, but it's easier to read, syntax is supported on Oracle/SQL Server/Postgres/etc, and provides consistent LEFT JOIN support.
Another thing - you only need to use backticks when tables and/or column names are MySQL keywords.
In your query you should use an explicit JOIN and you don't need the DISTINCT keyword.
This query should solve your problem. I am assuming here that it is possible for a user to submit the same highscore more than once on different dates, and if that happens then you want the oldest date:
SELECT T1.userkey, T1.score, username, dateachieved FROM (
(SELECT userkey, max(score) AS score
FROM table_highscores
WHERE gamekey = $gamekey
GROUP BY userkey) AS T1
JOIN
(SELECT userkey, score, min(dateachieved) as dateachieved
FROM table_highscores
WHERE gamekey = $gamekey
GROUP BY userkey, score) AS T2
ON T1.userkey = T2.userkey AND T1.score = T2.score
) JOIN table_users ON T1.userkey = table_users.userkey
LIMIT 20
You didn't say what language you are using to calculate the difference but I'm guessing it's PHP because of the $gamekey you used there (which should be escaped properly, btw).
If your dateachieved field is in the DATETIME format, you can calculate the difference like this:
$diff = round((time() - strtotime($row['dateachieved'])) / 86400);
I think you need to clarify your question a little better. Can you provide some data and expected outputs and then I should be able to help you further?