Using Multiple aggregate functions in the where clause - sql

We have a select statement in production that takes quite a lot of time.
The current query uses row number - window function.
I am trying to rewrite the query and test the same. assuming its orc table fetching aggregate values instead of using row number may help to reduce the execution time, is my assumption
Is something like this possible. Let me know if i am missing anything.
Sorry i am trying to learn, so please bear with my mistakes, if any.
I tried to rewrite the query as mentioned below.
Original query
SELECT
Q.id,
Q.crt_ts,
Q.upd_ts,
Q.exp_ts,
Q.biz_effdt
(
SELECT u.id, u.crt_ts, u.upd_ts, u.exp_ts, u.biz_effdt, ROW_NUMBER() OVER (PARTITION BY u.id ORDER BY u.crt_ts DESC) AS ROW_N
FROM ( SELECT cust_prd.id, cust_prd.crt_ts, cust_prd.upd_ts, cust_prd.exp_ts, cust_prd.biz_effdt FROM MSTR_CORE.cust_prd
WHERE biz_effdt IN ( SELECT MAX(cust_prd.biz_effdt) FROM MSTR_CORE.cust_prd )
) U
)Q WHERE Q.row_n = 1
My attempt:
SELECT cust_prd.id, cust_prd.crt_ts, cust_prd.upd_ts, cust_prd.exp_ts, cust_prd.biz_effdt FROM MSTR_CORE.cust_prd
WHERE biz_effdt IN ( SELECT MAX(cust_prd.biz_effdt) FROM MSTR_CORE.cust_prd )
having cust_prd.crt_ts = max (cust_prd.crt_ts)

Related

How to join records by date range

I need to match scrap records in one table with records indicating the material that was running at the same time on a machine. I have a table with the scrap counts and a table with records showing whenever the material changed on a machine.
I have a working query of which I will include a simplified version below, but it is very slow when applied to a large data set. I would like to try one of Oracle's analytical functions to make it faster, but I can't figure out how. I tried FIRST_VALUE, and ROW_NUMBER in a few different forms, but I couldn't get them right. Looking for any suggestions.
Please let me know if you would like more details.
Following are simplified versions of the tables:
Scrap readings table (~41m rows)
Machine
ScrapReasonCode
ScrapQuantity
ReportTime
Material numbers (~3m rows)
Machine
MaterialNumber
MEASUREMENT_TIMESTAMP
SELECT Scrap.Machine,
Scrap.MaterialNumber,
Scrap.ScrapReasonCode,
Scrap.ScrapQuantity,
Scrap.ReportTime
FROM Scrap, Materials
WHERE Scrap.Machine = Materials.Machine
AND Materials.MEASUREMENT_TIMESTAMP =
(SELECT MAX (M2.MEASUREMENT_TIMESTAMP)
FROM Materials M2
WHERE M2.Materials.Machine = Scrap.Machine
AND M2.MEASUREMENT_TIMESTAMP <= Scrap.ReportTime)
I think this is what you are trying to do. You can use the FIRST_VALUE window function.
SELECT DISTINCT
s.Machine,
s.MaterialNumber,
s.ScrapReasonCode,
s.ScrapQuantity,
s.ReportTime,
FIRST_VALUE(m.MEASUREMENT_TIMESTAMP) OVER(PARTITION BY s.Machine ORDER BY m.MEASUREMENT_TIMESTAMP DESC)
--or you can use the `MAX` window function too.
--MAX(m.MEASUREMENT_TIMESTAMP) OVER(PARTITION BY s.Machine)
FROM Scrap s
JOIN Materials m
WHERE s.Machine = m.Machine AND m.MEASUREMENT_TIMESTAMP <= s.ReportTime
I may be misunderstanding your requirements but I believe the following query should work in terms of implementing using ROW_NUMBER:
SELECT q.*
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Scrap.Machine ORDER BY Materials.MEASUREMENT_TIMESTAMP DESC) AS RNO
Scrap.MaterialNumber,
Scrap.ScrapReasonCode,
Scrap.ScrapQuantity,
Scrap.ReportTime
FROM Scrap, Materials
WHERE Scrap.Machine = Materials.Machine
AND Materials.MEASUREMENT_TIMESTAMP <= Scrap.ReportTime
) q
WHERE q.RNO = 1
Edit: if you need the measurement timestamp before (rather than on-or-before) the Scrap ReportTime, you could just change the <= sign to a < sign in the query above.

Eliminating Entries Based On Revision

I need to figure out how to eliminate older revisions from my query's results, my database stores orders as 'Q000000' and revisions have an appended '-number'. My query currently is as follows:
SELECT DISTINCT Estimate.EstimateNo
FROM Estimate
INNER JOIN EstimateDetails ON EstimateDetails.EstimateID = Estimate.EstimateID
INNER JOIN EstimateDoorList ON EstimateDoorList.ItemSpecID = EstimateDetails.ItemSpecID
WHERE (Estimate.SalesRepID = '67' OR Estimate.SalesRepID = '61') AND Estimate.EntryDate >= '2017-01-01 00:00:00.000' AND EstimateDoorList.SlabSpecies LIKE '%MDF%'
ORDER BY Estimate.EstimateNo
So for instance, the results would include:
Q120455-10
Q120445-11
Q121675-2
Q122361-1
Q123456
Q123456-1
From this, I need to eliminate 'Q120455-10' because of the presence of '-11' for that order, and 'Q123456' because of the presence of the '-1' revision. I'm struggling greatly with figuring out how to do this, my immediate thought was to use case statements but I'm not sure what is the best way to implement them and how to filter. Thank you in advance, let me know if any more information is needed.
First you have to parse your EstimateNo column into sequence number and revision number using CHARINDEX and SUBSTRING (or STRING_SPLIT in newer versions) and CAST/CONVERT the revision to a numeric type
SELECT
SUBSTRING(Estimate.EstimateNo,0,CHARINDEX('-',Estimate.EstimateNo)) as [EstimateNo],
CAST(SUBSTRING(Estimate.EstimateNo,CHARINDEX('-',Estimate.EstimateNo)+1, LEN(Estimate.EstimateNo)-CHARINDEX('-',Estimate.EstimateNo)+1) as INT) as [EstimateRevision]
FROM
...
You can then use
APPLY - to select TOP 1 row that matches the EstimateNo or
Window function such as ROW_NUMBER to select only records with row number of 1
For example, using a ROW_NUMBER would look something like below:
SELECT
ROW_NUMBER() OVER(PARTITION BY EstimateNo ORDER BY EstimateRevision DESC) AS "LastRevisionForEstimate",
-- rest of the needed columns
FROM
(
-- query above goes here
)
You can then wrap the query above in a simple select with a where predicate filtering out a specific value of LastRevisionForEstimate, for instance
SELECT --needed columns
FROM -- result set above
WHERE LastRevisionForEstimate = 1
Please note that this is to a certain extent, pseudocode, as I do not have your schema and cannot test the query
If you dislike the nested selects, check out the Common Table Expressions

How to sum only the first row for each group in a result set

Ok, I will try to explain myself the best I can, but I have the following:
I have a datasource that basically is a dynamic query. The query in itself shows 3 fields, Name, Amount1, Amount2.
Now, I could have rows with the same Name. The idea is to make a sum of Amount1+Amount2 WHEN Name is distinct from the previous one I saved. If I would do this on C# it could be something like this:
foreach (DataRow dr in repDset.Dataset.Rows)
{
total = (long)dr["Amount1"] + (long)dr["Amount2"];
if (thisconditiontrue)
{
if (PreviousName == "" || PreviousName != dr["Name"].ToString())
{
TotalName = TotalName + total;
}
PreviousName = dr["Name"].ToString();
}
}
The idea is to grab this and make a Reporting Services expression using the methods RS can give me, for example:
IIF(Fields!Name.Value<>Previous(Fields!Name.Value),Fields!Amount1.Value + Fields!Amount2.Value,False)
Something like that but that stores the amount of the previous one.
Maybe creating another field? a calculated one?
I can clarify further and edit if needed.
*EDIT for visual clarification:
As an example, it is something like this:
This query is assuming you're working with SQL server. But you're going to need something to order the query results by otherwise how do you know which row is the first one?
SELECT SUM(NameTotal) AS Total
FROM (
SELECT Name, Amount1 + Amount2 AS NameTotal,
ROW_NUMBER() OVER (ORDER BY OrderField PARTITION BY Name) AS rowNum
FROM srcTable
) AS a
WHERE rowNum=1;
This uses the analytical window function ROW_NUMBER() to number each row and the PARTITION BY clause tells it to reset the numbering for every different value of Name in the result set. You do need a field that you can order the results by though or this won't work. If you really just want a random order you can do ORDER BY NEWID() but that will give you a non-deterministic result.
This syntax is particular to SQL server but it can usually be achieved in other databases.
If you're looking to display the output like you've shown in your example you could use two queries and reference the other one by passing it as the scope to an aggregate function in an SSRS expression like this:
=MAX(Fields!Total.Value, "TotalQueryDataset")
Where your dataset is called "TotalQueryDataset".
Otherwise you can achieve the output using pure SQL like this:
WITH nameTotals AS (
SELECT Name, Amount1, Amount2,
ROW_NUMBER() OVER (ORDER BY OrderField PARTITION BY Name) AS rowNum
FROM srcTable
)
SELECT Name, Amount1, Amount2
FROM nameTotals
UNION ALL
SELECT 'Total', SUM(Amount1 + Amount2), NULL
FROM nameTotals
WHERE rowNum=1;

Access 2013 - Query not returning correct Number of Results

I am trying to get the query below to return the TWO lowest PlayedTo results for each PlayerID.
select
x1.PlayerID, x1.RoundID, x1.PlayedTo
from P_7to8Calcs as x1
where
(
select count(*)
from P_7to8Calcs as x2
where x2.PlayerID = x1.PlayerID
and x2.PlayedTo <= x1.PlayedTo
) <3
order by PlayerID, PlayedTo, RoundID;
Unfortunately at the moment it doesn't return a result when there is a tie for one of the lowest scores. A copy of the dataset and code is here http://sqlfiddle.com/#!3/4a9fc/13.
PlayerID 47 has only one result returned as there are two different RoundID's that are tied for the second lowest PlayedTo. For what I am trying to calculate it doesn't matter which of these two it returns as I just need to know what the number is but for reporting I ideally need to know the one with the newest date.
One other slight problem with the query is the time it takes to run. It takes about 2 minutes in Access to run through the 83 records but it will need to run on about 1000 records when the database is fully up and running.
Any help will be much appreciated.
Resolve the tie by adding DatePlayed to your internal sorting (you wanted the one with the newest date anyway):
select
x1.PlayerID, x1.RoundID
, x1.PlayedTo
from P_7to8Calcs as x1
where
(
select count(*)
from P_7to8Calcs as x2
where x2.PlayerID = x1.PlayerID
and (x2.PlayedTo < x1.PlayedTo
or x2.PlayedTo = x1.PlayedTo
and x2.DatePlayed >= x1.DatePlayed
)
) <3
order by PlayerID, PlayedTo, RoundID;
For performance create an index supporting the join condition. Something like:
create index P_7to8Calcs__PlayerID_RoundID on P_7to8Calcs(PlayerId, PlayedTo);
Note: I used your SQLFiddle as I do not have Acess available here.
Edit: In case the index does not improve performance enough, you might want to try the following query using window functions (which avoids nested sub-query). It works in your SQLFiddle but I am not sure if this is supported by Access.
select x1.PlayerID, x1.RoundID, x1.PlayedTo
from (
select PlayerID, RoundID, PlayedTo
, RANK() OVER (PARTITION BY PlayerId ORDER BY PlayedTo, DatePlayed DESC) AS Rank
from P_7to8Calcs
) as x1
where x1.RANK < 3
order by PlayerID, PlayedTo, RoundID;
See OVER clause and Ranking Functions for documentation.

What's wrong with this Oracle query?

Below is a query generated by the PetaPoco ORM for .NET. I don't have an Oracle client right now to debug it and I can't see anything obviously wrong (but I'm a SQL Server guy). Can anyone tell me why it is producing this error:
Oracle.DataAccess.Client.OracleException ORA-00923: FROM keyword not found where expected
SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) peta_rn,
"ON_CUST_MAS"."CU_NO",
"ON_CUST_MAS"."CU_NAME",
"ON_CUST_MAS"."CU_TYPE",
"ON_CUST_MAS"."CONTACT",
"ON_CUST_MAS"."ADD1_SH",
"ON_CUST_MAS"."ADD2_SH",
"ON_CUST_MAS"."CITY_SH",
"ON_CUST_MAS"."POST_CODE",
"ON_CUST_MAS"."PROV_SH",
"ON_CUST_MAS"."COUNTRY",
"ON_CUST_MAS"."PHONE_NU",
"ON_CUST_MAS"."FAX_NU",
"ON_CUST_MAS"."EMAIL",
"ON_CUST_MAS"."PU_ORDER_FL",
"ON_CUST_MAS"."CREDIT_AMOUNT"
FROM "ON_CUST_MAS" ) peta_paged
WHERE peta_rn>0 AND peta_rn<=20
Edit: Just in case it helps, this is a paging query. Regular queries (select all, select by ID) are working fine.
The problem is that the SELECT NULL in the ORDER BY clause of your analytic function is syntactically incorrect.
over (ORDER BY (SELECT NULL))
could be rewritten
(ORDER BY (SELECT NULL from dual))
or more simply
(ORDER BY null)
Of course, it doesn't really make sense to get a row_number if you aren't ordering the results by anything. There is no reason to expect that the set of rows that are returned would be consistent-- you could get any set of 20 rows arbitrarily. And if you go to the second page of results, there is no reason to expect that the second page of results would be completely different than the first page or that any particular result would appear on any page if you page through the entire result set.
There should be and order defined within ORDER BY clause. For example, lets say your elements are displayed in order of column "on_cust_mas"."cu_no", than your query should look like:
SELECT *
FROM (SELECT Row_number()
over (
ORDER BY ("on_cust_mas"."cu_no")) peta_rn,
"on_cust_mas"."cu_no",
"on_cust_mas"."cu_name",
"on_cust_mas"."cu_type",
"on_cust_mas"."contact",
"on_cust_mas"."add1_sh",
"on_cust_mas"."add2_sh",
"on_cust_mas"."city_sh",
"on_cust_mas"."post_code",
"on_cust_mas"."prov_sh",
"on_cust_mas"."country",
"on_cust_mas"."phone_nu",
"on_cust_mas"."fax_nu",
"on_cust_mas"."email",
"on_cust_mas"."pu_order_fl",
"on_cust_mas"."credit_amount"
FROM "on_cust_mas") peta_paged
WHERE peta_rn > 0
AND peta_rn <= 20
If this is a different column that sets the order just switch it within ORDER BY clause. In fact there should be any order defined, otherwise it's not guaranteed that it won't change, and you cant be sure what will be displayed at any page.