Optimization of multiple aggregate sorting in SQL - sql

I have a postgres query written for the Spree Commerce store that sorts all of it's products in the following order: In stock (then first available), Backorder (then first available), Sold out (then first available).
In order to chain it with rails scopes I had to put it in the order by clause as opposed to anywhere else. The query itself works, and is fairly performant, but complex. I was curious if anyone with a bit more knowledge could discuss a better way to do it? I'm interested in performance, but also different ways to approach the problem.
ORDER BY (
SELECT
CASE
WHEN tt.count_on_hand > 0
THEN 2
WHEN zz.backorderable = true
THEN 1
ELSE 0
END
FROM (
SELECT
row_number() OVER (dpartition),
z.id,
bool_or(backorderable) OVER (dpartition) as backorderable
FROM (
SELECT DISTINCT ON (spree_variants.id) spree_products.id, spree_stock_items.backorderable as backorderable
FROM spree_products
JOIN "spree_variants" ON "spree_variants"."product_id" = "spree_products"."id" AND "spree_variants"."deleted_at" IS NULL
JOIN "spree_stock_items" ON "spree_stock_items"."variant_id" = "spree_variants"."id" AND "spree_stock_items"."deleted_at" IS NULL
JOIN "spree_stock_locations" ON spree_stock_locations.id=spree_stock_items.stock_location_id
WHERE spree_stock_locations.active = true
) z window dpartition as (PARTITION by id)
) zz
JOIN (
SELECT
row_number() OVER (dpartition),
t.id,
sum(count_on_hand) OVER (dpartition) as count_on_hand
FROM (
SELECT DISTINCT ON (spree_variants.id) spree_products.id, spree_stock_items.count_on_hand as count_on_hand
FROM spree_products
JOIN "spree_variants" ON "spree_variants"."product_id" = "spree_products"."id" AND "spree_variants"."deleted_at" IS NULL
JOIN "spree_stock_items" ON "spree_stock_items"."variant_id" = "spree_variants"."id" AND "spree_stock_items"."deleted_at" IS NULL
) t window dpartition as (PARTITION by id)
) tt ON tt.row_number = 1 AND tt.id = spree_products.id
WHERE zz.row_number = 1 AND zz.id=spree_products.id
) DESC, available_on DESC
The FROM shown above determines whether or not a product is backorderable, and the JOIN shown above determines the stock in inventory. Note that these are very similar queries, except that I need to determine if something is backorderable based on a locations ability to support backorders and its state, WHERE spree_stock_locations.active=true.
Thanks for any advice!

Related

Conditionally use CASE...WHEN - Oracle SQL

I have two tables like so:
tblOrders: OrderNo (pk), CurrentStepNo (fk)
tblSteps: StepNo (pk), OrderNo (fk), StepName, StepType, StepStart, StepStop
tblOrders contains tons of information about our sales orders, while tblSteps contains tons of information regarding the proper sequential steps it takes to build the material we are selling.
I am trying to construct a query that follows this logic:
"For all orders, select the current step name from the step table. If
the Step Type is equal to 'XO', then select the most recently
completed (where StepStop is not null) regular step (where StepStop is
equal to 'YY')"
I have the following query:
SELECT
tblOrders.*,
tblSteps.StepName
FROM
tblOrders
INNER JOIN tblSteps
ON tblOrders.OrderNo = tblSteps.OrderNo
AND tblOrders.CurrentStepNo = tblSteps.StepNo
Which successfully returns to me the current step name for an in-process order. What I need to achieve is, when the tblOrders.CurrentStepNo is of type 'XO', to find the MAX(tblSteps.StepStop) WHERE tblSteps.StepType = 'YY'. However, I am having trouble putting that logic into my already working query.
Note: I am sorry for the lack of sample data in this example. I would normally post but cannot in this instance. This is also not a homework question.
I have reviewed these references:
Case in Select Statement
https://blogs.msdn.microsoft.com/craigfr/2006/08/23/subqueries-in-case-expressions/
But no luck so far.
I have tried this:
SELECT
tblOrders.*,
CASE
WHEN tblSteps.StepType = 'XO' THEN (-- Some logic here)
ELSE tblSteps.StepName
END AS StepName
FROM
tblOrders
INNER JOIN tblSteps
ON tblOrders.OrderNo = tblSteps.OrderNo
AND tblOrders.CurrentStepNo = tblSteps.StepNo
But am struggling to properly formulate the logic
Join all steps, rank them with ROW_NUMBER, and stay with the best ranked:
select *
from
(
select
o.*,
s.*,
row_number() over
(partition by o.orderno
order by case when s.steptype <> 'XO' and s.stepno = o.currentstepno then 1
when s.steptype <> 'YY' then 2
else 3 end, s.stepstop desc nulls last) as rn
from tblorders o
join tblsteps s on s.orderno = o.orderno
) ranked
where rn = 1
order by orderno;

LAG() function in sql 2008

I have looked at a few other questions regarding this problem, we are trying to get a stored procedure working that contains the LAG() function, but the machine we are now trying to install an instance on is SQL 2008 and we can't use it
SELECT se.SetID,SetName,ParentSetId,
qu.QuestionID,qu.QuestionText,qu.QuestionTypeID,qu.IsPublished,qu.IsFilter,
qu.IsRequired,qu.QueCode,qu.IsDisplayInTable,
Case when (LAG(se.ParentSetId) OVER(ORDER BY se.ParentSetId) <> ParentSetId) then 2 else 1 end level ,
QuestionType
FROM tblSet se
LEFT join tblQuestion qu on qu.SetID=se.SetID
Inner join tblQuestionType qt on qt.QuestionTypeID=qu.QuestionTypeID and qt.IsAnswer=1
where CollectionId=#colID and se.IsDeleted=0
order by se.SetID
What I've tried so far (edited to reflect Zohar Peled's) suggestoin
SELECT se.SetID,se.SetName,se.ParentSetId,
qu.QuestionID,qu.QuestionText,qu.QuestionTypeID,qu.IsPublished,qu.IsFilter,
qu.IsRequired,qu.QueCode,qu.IsDisplayInTable,
(case when row_number() over (partition by se.parentsetid
order by se.parentsetid
) = 1
then 1 else 2
end) as level,
QuestionType
FROM tblSet se
left join tblSet se2 on se.ParentSetId = se2.ParentSetId -1
LEFT join tblQuestion qu on qu.SetID=se.SetID
Inner join tblQuestionType qt on qt.QuestionTypeID=qu.QuestionTypeID and qt.IsAnswer=1
where se.CollectionId=#colID and se.IsDeleted=0
order by se.SetID
it does not seem to be bringing out all of the same records when I run them side by side and the level value seems to be different also
I have put in some of the outputs into a HTML formatted table from the version containing LAG() (the first results) then the second is the new version, where the levels are not coming out the same
https://jsfiddle.net/gyn8Lv3u/
LAG() can be implemented using a self-join as Jeroen wrote in his comment, or by using a correlated subquery. In this case, it's a simple lag() so the correlated subquery is also simple:
SELECT se.SetID,SetName,ParentSetId,
qu.QuestionID,qu.QuestionText,qu.QuestionTypeID,qu.IsPublished,qu.IsFilter,
qu.IsRequired,qu.QueCode,qu.IsDisplayInTable,
Case when (
(
SELECT TOP 1 ParentSetId
FROM tblSet seInner
WHERE seInner.ParentSetId < se.ParentSetId
ORDER BY seInner.ParentSetId DESC
)
<> ParentSetId) then 2 else 1 end level ,
QuestionType
FROM tblSet se
LEFT join tblQuestion qu on qu.SetID=se.SetID
Inner join tblQuestionType qt on qt.QuestionTypeID=qu.QuestionTypeID and qt.IsAnswer=1
where CollectionId=#colID and se.IsDeleted=0
order by se.SetID
If you had specified an offset it would be harder do implement using a correlated subquery, and a self join would make a much easier solution.
Sample data and desired results would help. This construct:
(case when (LAG(se.ParentSetId) OVER(ORDER BY se.ParentSetId) <> ParentSetId) then 2 else 1
end) as level
is quite strange. You are lagging by the only column used in the order by. That makes sense. But then you are comparing the value to the same column, implying that there are duplicates.
If you have duplicates, then order by se.ParentSetId is unstable. That is, the "previous" row is indeterminate because of the duplicate values being ordered. You can run the query twice and get different results.
I am guessing you want one row with the value 1 for each parent set id. If so, then in either database, you would use:
(case when row_number() over (partition by se.parentsetid
order by se.parentsetid
) = 1
then 1 else 2
end) as level
This also has the problem with an unstable ordering. You can fix this by changing the order by to what you really want.

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

distinct value per column

I am looking at a report on policy exceptions based on various criteria such as Beacon Score, Debt to Income, and Loan to Value. This information is kept in multiple different tables, and right now the Loan to Value column is causing multiple entries in my report because a specific loan might have multiple pieces of collateral. For proper exception monitoring, I only need one entry.
With all that said, how might I execute the following code, with a distinct value for dbo.Folders.Id? Just putting 'DISTINCT' after the SELECT statement does not seem to work. (Sensitive values masked with '#'.)
SELECT dbo.Folders.LoanOfficerId,
dbo.Folders.Id,
dbo.CollateralType.Description,
dbo.Customers.CUSTNAME,
dbo.Folders.DateLoanActivated,
dbo.Folders.CurrentAccountBalance,
dbo.Folders.UnadvancedCommitAmount,
dbo.Folders.BeaconScore,
dbo.Folders.DebtToIncome,
dbo.Collateral.LoanToValue
FROM dbo.Folders
INNER JOIN dbo.Customers
ON dbo.Folders.CustomersNAMEKEY = dbo.Customers.NAMEKEY
INNER JOIN dbo.Collateral
ON dbo.Folders.Id = dbo.Collateral.FoldersID
INNER JOIN dbo.CollateralType
ON dbo.Collateral.CollateralTypeCollCode = dbo.CollateralType.CollCode
WHERE ( (dbo.Folders.BeaconScore < ###)
AND (dbo.Folders.BeaconScore > ###)
AND (dbo.Folders.CloseCode = 'O')
AND (dbo.Folders.CollateralCode <> ##)
)
OR ( (dbo.Folders.CloseCode = 'O')
AND (dbo.Folders.CustomerType <> '###')
AND (dbo.Folders.CustomerType <> '###')
AND (dbo.Folders.DebtToIncome > ##)
)
OR ( (dbo.Folders.CloseCode = 'O')
AND (dbo.Folders.CustomerType = '###')
AND (dbo.Folders.DebtToIncome > ##)
)
OR ( (dbo.Folders.CloseCode = 'O')
AND (dbo.Folders.CustomerType = '###')
AND (dbo.Folders.DebtToIncome > ##)
)
OR (dbo.Collateral.LoanToValue > dbo.CollateralType.LTV)
Any constructive criticism on my code is welcome. (Static values in the above statement are on the docket to be corrected later with a thresholds/criteria table.) From what I have seen, others have suggested using ROW_COUNT() with PARTITION, but I am unable to make the syntax work.
Comment about formatting: learn to use table aliases. They make the query easier to read and write.
If you only need one row from the results, you can use row_number(). This enumerates the rows for each folder (in your case) and you would just use the first one. You can do this using:
with t as (
<your query here>
)
select t.*
from (select t.*,
row_number() over (partition by FoldersId order by (select NULL)) as seqnum
from t
) t
where seqnum = 1;
On the other hand, if you needed to aggregate information from the collateral tables, then you would use group by in your query with the appropriate aggregation functions.

Sql Server Query Selecting Top and grouping by

SpousesTable
SpouseID
SpousePreviousAddressesTable
PreviousAddressID, SpouseID, FromDate, AddressTypeID
What I have now is updating the most recent for the whole table and assigning the most recent regardless of SpouseID the AddressTypeID = 1
I want to assign the most recent SpousePreviousAddress.AddressTypeID = 1
for each unique SpouseID in the SpousePreviousAddresses table.
UPDATE spa
SET spa.AddressTypeID = 1
FROM SpousePreviousAddresses AS spa INNER JOIN Spouses ON spa.SpouseID = Spouses.SpouseID,
(SELECT TOP 1 SpousePreviousAddresses.* FROM SpousePreviousAddresses
INNER JOIN Spouses AS s ON SpousePreviousAddresses.SpouseID = s.SpouseID
WHERE SpousePreviousAddresses.CountryID = 181 ORDER BY SpousePreviousAddresses.FromDate DESC) as us
WHERE spa.PreviousAddressID = us.PreviousAddressID
I think I need a group by but my sql isn't all that hot. Thanks.
Update that is Working
I was wrong about having found a solution to this earlier. Below is the solution I am going with
WITH result AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY SpouseID ORDER BY FromDate DESC) AS rowNumber, *
FROM SpousePreviousAddresses
WHERE CountryID = 181
)
UPDATE result
SET AddressTypeID = 1
FROM result WHERE rowNumber = 1
Presuming you are using SQLServer 2005 (based on the error message you got from the previous attempt) probably the most straightforward way to do this would be to use the ROW_NUMBER() Function couple with a Common Table Expression, I think this might do what you are looking for:
WITH result AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY SpouseID ORDER BY FromDate DESC) as rowNumber,
*
FROM
SpousePreviousAddresses
)
UPDATE SpousePreviousAddresses
SET
AddressTypeID = 2
FROM
SpousePreviousAddresses spa
INNER JOIN result r ON spa.SpouseId = r.SpouseId
WHERE r.rowNumber = 1
AND spa.PreviousAddressID = r.PreviousAddressID
AND spa.CountryID = 181
In SQLServer2005 the ROW_NUMBER() function is one of the most powerful around. It is very usefull in lots of situations. The time spent learning about it will be re-paid many times over.
The CTE is used to simplyfy the code abit, as it removes the need for a temporary table of some kind to store the itermediate result.
The resulting query should be fast and efficient. I know the select in the CTE uses *, which is a bit of overkill as we dont need all the columns, but it may help to show what is happening if anyone want to see what is happening inside the query.
Here's one way to do it:
UPDATE spa1
SET spa1.AddressTypeID = 1
FROM SpousePreviousAddresses AS spa1
LEFT OUTER JOIN SpousePreviousAddresses AS spa2
ON (spa1.SpouseID = spa2.SpouseID AND spa1.FromDate < spa2.FromDate)
WHERE spa1.CountryID = 181 AND spa2.SpouseID IS NULL;
In other words, update the row spa1 for which no other row spa2 exists with the same spouse and a greater (more recent) date.
There's exactly one row for each value of SpouseID that has the greatest date compared to all other rows (if any) with the same SpouseID.
There's no need to use a GROUP BY, because there's kind of an implicit grouping done by the join.
update: I think you misunderstand the purpose of the OUTER JOIN. If there is no row spa2 that matches all the join conditions, then all columns of spa2.* are returned as NULL. That's how outer joins work. So you can search for the cases where spa1 has no matching row spa2 by testing that spa2.SpouseID IS NULL.
UPDATE spa SET spa.AddressTypeID = 1
WHERE spa.SpouseID IN (
SELECT DISTINCT s1.SpouseID FROM Spa S1, SpousePreviousAddresses S2
WHERE s1.SpouseID = s2.SpouseID
AND s2.CountryID = 181
AND s1.PreviousAddressId = s2.PreviousAddressId
ORDER BY S2.FromDate DESC)
Just a guess.