display the cab with the highest overall maintenance cost - sql

I'm not sure how to get the max of the sum. I thought i could just display it in descending order and then use "rownum=1" but that didnt work. Any suggestions? Here's my code.
select ca_make, sum(ma_cost)
from cab join maintain on ca_cabnum = ma_cabnum
Where rownum =1
group by ca_make
order by sum(ma_cost) desc

ROWNUM() is applied before the ORDER BY. You need to use a sub-query:
select * from (
select ca_make, sum(ma_cost)
from cab join maintain on ca_cabnum = ma_cabnum
group by ca_make
order by sum(ma_cost) desc
)
where rownum = 1
There are several different ways of implementing [top-n] queries in Oracle. Find out more by searching SO for [oracle] [top-n].

First of all, you probably want to use LEFT JOIN. By using JOIN, you're excluding all cabs that have had no maintenance at all. (obviously, this won't matter for finding the highest cost, but it will make a difference when looking for the lowest cost; and it would seriously skew any stats you tried to compile from this query).
Now, to answer your question... try this:
select * from
(select ca_make, sum(ma_cost)
from cab
left join maintain on ca_cabnum = ma_cabnum
group by ca_make
order by sum(ma_cost) desc)
where rownum = 1
here is a good explanation of ROWNUM. Your case is addressed specifically, a little less than half-way down the page (but the whole page is probably worth a read, if you're going to use the feature).

Related

How do I find the previous line without writing an inefficient subquery?

So I have this query (and have encountered or coded a bunch of similar ones during my life :) ) which is extremely inefficient in terms of performance, due to the subquery.
I'm running pgsql currently but have had this issue with mysql and mssql as well.
Sometimes I can use MAX() but here I have 2 different columns: runners.id (the one I need to find my data) and runners.updated_at (the one on which I could do MAX()).
Any tips?
SELECT
ROUND(CAST(AVG(DATE_PART('day', current_claim_event.updated_at - claims.created_at)) AS NUMERIC),1)
AS average, count(*)
FROM claims_events current_claim_event
INNER JOIN claims ON claims.id = current_claim_event.claim_id
WHERE current_claim_event.id = (
SELECT runners.id
FROM claims_events runners
WHERE runners.claim_id = current_claim_event.claim_id
ORDER BY runners.updated_at DESC
LIMIT 1
);

How to join records by date range

I need to match scrap records in one table with records indicating the material that was running at the same time on a machine. I have a table with the scrap counts and a table with records showing whenever the material changed on a machine.
I have a working query of which I will include a simplified version below, but it is very slow when applied to a large data set. I would like to try one of Oracle's analytical functions to make it faster, but I can't figure out how. I tried FIRST_VALUE, and ROW_NUMBER in a few different forms, but I couldn't get them right. Looking for any suggestions.
Please let me know if you would like more details.
Following are simplified versions of the tables:
Scrap readings table (~41m rows)
Machine
ScrapReasonCode
ScrapQuantity
ReportTime
Material numbers (~3m rows)
Machine
MaterialNumber
MEASUREMENT_TIMESTAMP
SELECT Scrap.Machine,
Scrap.MaterialNumber,
Scrap.ScrapReasonCode,
Scrap.ScrapQuantity,
Scrap.ReportTime
FROM Scrap, Materials
WHERE Scrap.Machine = Materials.Machine
AND Materials.MEASUREMENT_TIMESTAMP =
(SELECT MAX (M2.MEASUREMENT_TIMESTAMP)
FROM Materials M2
WHERE M2.Materials.Machine = Scrap.Machine
AND M2.MEASUREMENT_TIMESTAMP <= Scrap.ReportTime)
I think this is what you are trying to do. You can use the FIRST_VALUE window function.
SELECT DISTINCT
s.Machine,
s.MaterialNumber,
s.ScrapReasonCode,
s.ScrapQuantity,
s.ReportTime,
FIRST_VALUE(m.MEASUREMENT_TIMESTAMP) OVER(PARTITION BY s.Machine ORDER BY m.MEASUREMENT_TIMESTAMP DESC)
--or you can use the `MAX` window function too.
--MAX(m.MEASUREMENT_TIMESTAMP) OVER(PARTITION BY s.Machine)
FROM Scrap s
JOIN Materials m
WHERE s.Machine = m.Machine AND m.MEASUREMENT_TIMESTAMP <= s.ReportTime
I may be misunderstanding your requirements but I believe the following query should work in terms of implementing using ROW_NUMBER:
SELECT q.*
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Scrap.Machine ORDER BY Materials.MEASUREMENT_TIMESTAMP DESC) AS RNO
Scrap.MaterialNumber,
Scrap.ScrapReasonCode,
Scrap.ScrapQuantity,
Scrap.ReportTime
FROM Scrap, Materials
WHERE Scrap.Machine = Materials.Machine
AND Materials.MEASUREMENT_TIMESTAMP <= Scrap.ReportTime
) q
WHERE q.RNO = 1
Edit: if you need the measurement timestamp before (rather than on-or-before) the Scrap ReportTime, you could just change the <= sign to a < sign in the query above.

Group By seems to add inordinate amount of computation in a simple Postgres query

I have a Postgres query as such:
select id,
from ads_1 as a
join ads_2 as b
on a.id_key = b.id_key
where b.date between '2014-01-01' and '2014-01-02'
group by id
order by id;
It's nothing fancy but works fine -- only takes about 3 minutes when querying a large database to return the result.
My question is, why does this slight modification to the above code cause the time for the query to more-than quadruple?
select id, b.ad_description
from ads_1 as a
join ads_2 as b
on a.id_key = b.id_key
where b.date between '2014-01-01' and '2014-01-02'
group by id, b.ad_description
order by id;
What is going on? The mere inclusion of one simple column of (albeit unique) information is bogging my query down. I am somehow asking Postgres to do a tremendously larger amount of work. For the life of me, I don't see how.
I'd like to preemptively apologize for not including any raw data. I'm hoping this simplified example of what I'm really facing is clear enough for some kind soul to make an enlightening comment. I can say that I'm going over a million rows in each table.
Thanks in advance.
i think the size of the return set is the issue. since your result set now has a millionish rows, that would explain the extra time (although it does seem excessive). a couple of things. first, the first column in the select is not bound to a range variable, it probably doesn't matter. but, you might want to make it select b.id, b.ad_description instead of id, b.ad_description. second, normally i use a group by when one of the columns is an aggregate that isn't in the group by statement, like a 'count()' or something. maybe i'm missing something, but, you might get the same result with:
select distinct b.id, b.ad_description
from ads_1 as a
join ads_2 as b
on a.id_key = b.id_key
where b.date between '2014-01-01' and '2014-01-02'
order by b.id, b.ad_description;
you might want to play with some of the numbers in the postgresql.conf file to beef up workspace/query space.
finally, i have this funny hunch that the order by and the group by matching might also help performance.
the LIMIT 5 wouldn't help much with performance, because the entire query would need to be finished before the first 5 rows would come out.
-g

Subqueries and AVG() on a subtraction

Working on a query to return the average time from when an employee begins his/her shift and then arrives at the first home (this DB assumes they are salesmen).
What I have:
SELECT l.OFFICE_NAME, crew.EMPLOYEE_NAME, //avg(first arrival time)
FROM LOCAL_OFFICE l, CREW_WORK_SCHEDULE crew,
WHERE l.LOCAL_OFFICE_ID = crew1.LOCAL_OFFICE_ID
You can see the AVG() command is commented out, because I know the time that they arrive at work, and the time they get to the first house, and can find the value using this:
(SELECT MIN(c.ARRIVE)
FROM ORDER_STATUS c
WHERE c.USER_ID = crew.CREW_ID)
-(SELECT START_TIME
FROM CREW_SHIFT_CODES
WHERE WORK_SHIFT_CODE = crew.WORK_SHIFT_CODE)
Would the best way be to simply put the above into the the AVG() parentheses? Just trying to learn the best methods to create queries. If you want more info on any of the tables, etc. just ask, but hopefully they're all named so you know what they're returning.
As per my comment, the example you gave would only return one record to the AVG function, and so not do very much.
If the sub-query was returning multiple records, however, your suggestion of placing the sub-query inside the AVG() would work...
SELECT
AVG((SELECT MIN(sub.val) FROM sub WHERE sub.id = main.id GROUP BY sub.group))
FROM
main
GROUP BY
main.group
(Averaging a set of minima, and so requiring two levels of GROUP BY.)
In many cases this gives good performance, and is maintainable. But sometimes the sub-query grows large, and it can be better to reformat it using an inline view...
SELECT
main.group,
AVG(sub_query.val)
FROM
main
INNER JOIN
(
SELECT
sub.id,
sub.group,
MIN(sub.val) AS val
FROM
sub
GROUP BY
sub.id
sub.group
)
AS sub_query
ON sub_query.id = main.id
GROUP BY
main.group
Note: Although this looks as though the inline view will calculate a lod of values that are not needed (and so be inefficient), most RDBMS optimise this so only the required records get processes. (The optimiser knows how the inner query is being used by the outer query, and builds the execution plan accordingly.)
Don't think of subqueries: they're often quite slow. In effect, they are row by row (RBAR) operations rather than set based
join all the table together
I've used a derived table to calculate the 1st arrival time
Aggregate
Soemthing like
SELECT
l.OFFICE_NAME, crew.EMPLOYEE_NAME,
AVG(os.minARRIVE - cs.START_TIME)
FROM
LOCAL_OFFICE l
JOIN
CREW_WORK_SCHEDULE crew On l.LOCAL_OFFICE_ID = crew1.LOCAL_OFFICE_ID
JOIN
CREW_SHIFT_CODES cs ON cs.WORK_SHIFT_CODE = crew.WORK_SHIFT_CODE
JOIN
(SELECT MIN(ARRIVE) AS minARRIVE, USER_ID
FROM ORDER_STATUS
GROUP BY USER_ID
) os ON oc.USER_ID = crew.CREW_ID
GROUP B
l.OFFICE_NAME, crew.EMPLOYEE_NAME
This probably won't give correct data because of the minARRIVE grouping: there isn't enough info from ORDER_STATUS to show "which day" or "which shift". It's simply "first arrival for that user for all time"
Edit:
This will give you average minutes
You can add this back to minARRIVE using DATEADD, or change to hh:mm with some %60 (modul0) and /60 (integer divide
AVG(
DATEDIFF(minute, os.minARRIVE, os.minARRIVE)
)

SQL conundrum, how to select latest date for part, but only 1 row per part (unique)

I am trying to wrap my head around this one this morning.
I am trying to show inventory status for parts (for our products) and this query only becomes complex if I try to return all parts.
Let me lay it out:
single table inventoryReport
I have a distinct list of X parts I wish to display, the result of which must be X # of rows (1 row per part showing latest inventory entry).
table is made up of dated entries of inventory changes (so I only need the LATEST date entry per part).
all data contained in this single table, so no joins necessary.
Currently for 1 single part, it is fairly simple and I can accomplish this by doing the following sql (to give you some idea):
SELECT TOP (1) ldDate, ptProdLine, inPart, inSite, inAbc, ptUm, inQtyOh + inQtyNonet AS in_qty_oh, inQtyAvail, inQtyNonet, ldCustConsignQty, inSuppConsignQty
FROM inventoryReport
WHERE (ldPart = 'ABC123')
ORDER BY ldDate DESC
that gets me my TOP 1 row, so simple per part, however I need to show all X (lets say 30 parts). So I need 30 rows, with that result. Of course the simple solution would be to loop X# of sql calls in my code (but it would be costly) and that would suffice, but for this purpose I would love to work this SQL some more to reduce the x# calls back to the db (if not needed) down to just 1 query.
From what I can see here I need to keep track of the latest date per item somehow while looking for my result set.
I would ultimately do a
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
to limit the parts I need. Hopefully I made my question clear enough. Let me know if you have an idea. I cannot do a DISTINCT as the rows are not the same, the date needs to be the latest, and I need a maximum of X rows.
Thoughts? I'm stuck...
SELECT *
FROM (SELECT i.*,
ROW_NUMBER() OVER(PARTITION BY ldPart ORDER BY ldDate DESC) r
FROM inventoryReport i
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
)
WHERE r = 1
EDIT: Be sure to test the performance of each solution. As pointed out in this question, the CTE method may outperform using ROW_NUMBER.
;with cteMaxDate as (
select ldPart, max(ldDate) as MaxDate
from inventoryReport
group by ldPart
)
SELECT md.MaxDate, ir.ptProdLine, ir.inPart, ir.inSite, ir.inAbc, ir.ptUm, ir.inQtyOh + ir.inQtyNonet AS in_qty_oh, ir.inQtyAvail, ir.inQtyNonet, ir.ldCustConsignQty, ir.inSuppConsignQty
FROM cteMaxDate md
INNER JOIN inventoryReport ir
on md.ldPart = ir.ldPart
and md.MaxDate = ir.ldDate
You need to join into a Sub-query:
SELECT i.ldPart, x.LastDate, i.inAbc
FROM inventoryReport i
INNER JOIN (Select ldPart, Max(ldDate) As LastDate FROM inventoryReport GROUP BY ldPart) x
on i.ldPart = x.ldPart and i.ldDate = x.LastDate