Count half of rest of a partition by from position - sql

I'm trying to achieve the following results:
now, the group comes from
SUM(CASE WHEN seqnum <= (0.5 * seqnum_rev) THEN i.[P&L] END) OVER(PARTITION BY i.bracket_label ORDER BY i.event_id) AS [P&L 50%],
I need that in each iteration it counts the total of rows from the end till position (seq_inv) and sum the amounts in P&L only for the half of it from that position.
for example, when
seq = 2
seq_inv will be = 13, half of it is 6 so I need to sum the following 6 positions from seq = 2.
when seq = 4 there are 11 positions till the end (seq_inv = 11), so half is 5, so I want to count 5 positions from seq = 4.
I hope this makes sense, I'm trying to come up with a rule that will be able to adapt to the case I have, since the partition by is what gives me the numbers that need to be summed.
I was also thinking if there was something to do with a partition by top 50% or something like that, but I guess that doesn't exist.

I have the advantage that I've helped him before and have a little extra context.
That context is that this is just the later stage of a very long chain of common table expressions. That means self-joins and/or correlated sub-queries are unfortunately expensive.
Preferably, this should be answerable using window functions, as the data set is already available in the appropriate ordering and partitioning.
My reading is this...
The SUM(5:9) (meaning the sum of rows 5 to row 9, inclusive) is equal to SUM(5:end) - SUM(10:end)
That leads me to this...
WITH
cumulative AS
(
SELECT
*,
SUM([P&L]) OVER (PARTITION BY bracket_label ORDER BY event_id DESC) AS cumulative_p_and_l
FROM
data
)
SELECT
*,
cum_val - LEAD(cumulative_p_and_l, seq_inv/2, 0) OVER (PARTITION BY bracket_label ORDER BY event_id) AS p_and_l_50_perc,
cum_val - LEAD(cumulative_p_and_l, seq_inv/4, 0) OVER (PARTITION BY bracket_label ORDER BY event_id) AS p_and_l_25_perc,
FROM
cumulative
NOTE: Using , &, % in column names is horrendous, don't do it ;)
EDIT: Corrected the ORDER BY in the cumulative sum.

I don't think that window functions can do what you want. You could use a correlated subquery instead, with the following logic:
select
t.*,
(
select sum(t1.P&L]
from mytable t1
where t1.seq - t.seq between 0 and t.seq_inv/2
) [P&L 50%]
from mytable t

Related

Sql Islands and Gaps Merge Contiguous records if relevant fields hold same values

I have created a test case here for my problem https://rextester.com/ZRXSQ14415
Its must each easier to show the problem to explain what I am trying to achieve.
I have a list of records across time and I wish to merge contiguous records into a single record.
Each record has a period Date, Risk Levels and a couple of flags. When these risks and flags are the same the records should be merged when they are different then they should be a separate row.
On the Rextester example, i have almost achieved my goal, however look at rows 3 + 4 of the result.
What I want to achieve is that rows 3 + 4 would be combined such that row 3
StartDate End Date Name ... ...
17.03.2019 20.03.2019 CPWJ40-A ... ...
As all flags and risk levels are the same.
Change the SEQ expression to
..
ROW_NUMBER() OVER (ORDER BY PeriodDate) - ROW_NUMBER() OVER (Partition BY ImplicitRisk,QCReadyRisk,IsQualityControlReady, ActivePeriod ORDER BY PeriodDate) AS SEQ
..
This way you'll get the proper grouping of islands of ImplicitRisk,QCReadyRisk,IsQualityControlReady, ActivePeriod.
This answer is purely to complement Serg answer with the full query.
SELECT MIN(d.PeriodDate) AS StartDate,
MAX(d.PeriodDate) AS EndDate,
ImplicitRisk,
QcReadyRisk,
IsQualityControlReady,
ActivePeriod,
LocationEventName
FROM
(
SELECT c.*,
ROW_NUMBER() OVER (ORDER BY PeriodDate) - ROW_NUMBER() OVER (Partition BY LocationEventId, ImplicitRisk, QCReadyRisk, IsQualityControlReady, ActivePeriod ORDER BY PeriodDate) AS grp
FROM tab c
--order by PeriodDate
) d
group by ImplicitRisk, QcReadyRisk, IsQualityControlReady, ActivePeriod, LocationEventName, grp
order by 1

How to track iterations of a value in sql

I am trying to track usage of a blade in a manufacturing process using SSMS 2017. The blade is loaded and used on product until it is seen to dull and then taken out for sharpening while another blade replaces it. We have 30 blades that are cycled for use and sharpening.
Using a table that provides product lot number (sequential) and blade name I would like to separate each batch use of the blade name into groups.
My sql skills are pretty basic so I've been trying row_number, rank, and some attempts at utilizing the lead/lag functions. So far this has only enabled me to break down each product into order based on blade name and identify the product on which a blade change is made. I feel like that could be useful but I'm having trouble figuring out exactly how to do it.
I would like to be able to assign each group of product manufactured with an iteration of a blade a identifying number. For example:
LotNo BladeID Iteration
418211 BH40 1
418212 BH40 1
418213 BH40 1
418214 ES11 2
418215 ES11 2
418216 BH40 3
I'm currently able to produce these incorrect results:
Using:
SELECT b.LotNo,
b.BladeID,
ROW_NUMBER() OVER (PARTITION BY b.BladeID ORDER BY b.BladeID)
FROM blades AS b
ORDER BY b.LotNo ASC;
I get:
LotNo BladeID Iteration
418211 BH40 1
418212 BH40 2
418213 BH40 3
418214 ES11 1
418215 ES11 2
418216 BH40 4
Here's a possible solution to your problem. It first creates a series of groups to identify when the number must change. Then it gets an order to assign the correct value to each group. And finally, it assigns the value for the iteration. I'm including the sample data in a consumable way so anyone can use it for testing purposes.
CREATE TABLE #Sample(
LotNo int,
BladeID varchar(10),
Iteration int
);
INSERT INTO #Sample
VALUES
(418211, 'BH40', 1),
(418212, 'BH40', 1),
(418213, 'BH40', 1),
(418214, 'ES11', 2),
(418215, 'ES11', 2),
(418216, 'BH40', 3);
GO
WITH cteGroups AS(
SELECT *,
ROW_NUMBER() OVER(ORDER BY LotNo) - ROW_NUMBER() OVER(PARTITION BY BladeID ORDER BY LotNo) AS island
FROM #Sample
),
cteOrdering AS(
SELECT *, MIN( LotNo) OVER( PARTITION BY island, BladeID) AS OrderCol
FROM cteGroups
)
SELECT LotNo,
BladeID,
Iteration,
DENSE_RANK() OVER( ORDER BY OrderCol) AS IterationCalc
FROM cteOrdering;
You can do this with lag() and a cumulative sum:
select s.*,
sum(case when prev_BladeID = BladeId then 0 else 1 end) over (order by LotNo) as Iteration
from (select s.*,
lag(s.BladeID) over (order by s.LotNo) as prev_BladeID
from #sample s
) s;
In addition to being simpler code than the difference of row numbers, I think this is also simpler conceptually. This is simply counting the number of times that the BladeID changes from one lot to the next.

Access 2013 - Query not returning correct Number of Results

I am trying to get the query below to return the TWO lowest PlayedTo results for each PlayerID.
select
x1.PlayerID, x1.RoundID, x1.PlayedTo
from P_7to8Calcs as x1
where
(
select count(*)
from P_7to8Calcs as x2
where x2.PlayerID = x1.PlayerID
and x2.PlayedTo <= x1.PlayedTo
) <3
order by PlayerID, PlayedTo, RoundID;
Unfortunately at the moment it doesn't return a result when there is a tie for one of the lowest scores. A copy of the dataset and code is here http://sqlfiddle.com/#!3/4a9fc/13.
PlayerID 47 has only one result returned as there are two different RoundID's that are tied for the second lowest PlayedTo. For what I am trying to calculate it doesn't matter which of these two it returns as I just need to know what the number is but for reporting I ideally need to know the one with the newest date.
One other slight problem with the query is the time it takes to run. It takes about 2 minutes in Access to run through the 83 records but it will need to run on about 1000 records when the database is fully up and running.
Any help will be much appreciated.
Resolve the tie by adding DatePlayed to your internal sorting (you wanted the one with the newest date anyway):
select
x1.PlayerID, x1.RoundID
, x1.PlayedTo
from P_7to8Calcs as x1
where
(
select count(*)
from P_7to8Calcs as x2
where x2.PlayerID = x1.PlayerID
and (x2.PlayedTo < x1.PlayedTo
or x2.PlayedTo = x1.PlayedTo
and x2.DatePlayed >= x1.DatePlayed
)
) <3
order by PlayerID, PlayedTo, RoundID;
For performance create an index supporting the join condition. Something like:
create index P_7to8Calcs__PlayerID_RoundID on P_7to8Calcs(PlayerId, PlayedTo);
Note: I used your SQLFiddle as I do not have Acess available here.
Edit: In case the index does not improve performance enough, you might want to try the following query using window functions (which avoids nested sub-query). It works in your SQLFiddle but I am not sure if this is supported by Access.
select x1.PlayerID, x1.RoundID, x1.PlayedTo
from (
select PlayerID, RoundID, PlayedTo
, RANK() OVER (PARTITION BY PlayerId ORDER BY PlayedTo, DatePlayed DESC) AS Rank
from P_7to8Calcs
) as x1
where x1.RANK < 3
order by PlayerID, PlayedTo, RoundID;
See OVER clause and Ranking Functions for documentation.

Select finishes where athlete didn't finish first for the past 3 events

Suppose I have a database of athletic meeting results with a schema as follows
DATE,NAME,FINISH_POS
I wish to do a query to select all rows where an athlete has competed in at least three events without winning. For example with the following sample data
2013-06-22,Johnson,2
2013-06-21,Johnson,1
2013-06-20,Johnson,4
2013-06-19,Johnson,2
2013-06-18,Johnson,3
2013-06-17,Johnson,4
2013-06-16,Johnson,3
2013-06-15,Johnson,1
The following rows:
2013-06-20,Johnson,4
2013-06-19,Johnson,2
Would be matched. I have only managed to get started at the following stub:
select date,name FROM table WHERE ...;
I've been trying to wrap my head around the where clause but I can't even get a start
I think this can be even simpler / faster:
SELECT day, place, athlete
FROM (
SELECT *, min(place) OVER (PARTITION BY athlete
ORDER BY day
ROWS 3 PRECEDING) AS best
FROM t
) sub
WHERE best > 1
->SQLfiddle
Uses the aggregate function min() as window function to get the minimum place of the last three rows plus the current one.
The then trivial check for "no win" (best > 1) has to be done on the next query level since window functions are applied after the WHERE clause. So you need at least one CTE of sub-select for a condition on the result of a window function.
Details about window function calls in the manual here. In particular:
If frame_end is omitted it defaults to CURRENT ROW.
If place (finishing_pos) can be NULL, use this instead:
WHERE best IS DISTINCT FROM 1
min() ignores NULL values, but if all rows in the frame are NULL, the result is NULL.
Don't use type names and reserved words as identifiers, I substituted day for your date.
This assumes at most 1 competition per day, else you have to define how to deal with peers in the time line or use timestamp instead of date.
#Craig already mentioned the index to make this fast.
Here's an alternative formulation that does the work in two scans without subqueries:
SELECT
"date", athlete, place
FROM (
SELECT
"date",
place,
athlete,
1 <> ALL (array_agg(place) OVER w) AS include_row
FROM Table1
WINDOW w AS (PARTITION BY athlete ORDER BY "date" ASC ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
) AS history
WHERE include_row;
See: http://sqlfiddle.com/#!1/fa3a4/34
The logic here is pretty much a literal translation of the question. Get the last four placements - current and the previous 3 - and return any rows in which the athlete didn't finish first in any of them.
Because the window frame is the only place where the number of rows of history to consider is defined, you can parameterise this variant unlike my previous effort (obsolete, http://sqlfiddle.com/#!1/fa3a4/31), so it works for the last n for any n. It's also a lot more efficient than the last try.
I'd be really interested in the relative efficiency of this vs #Andomar's query when executed on a dataset of non-trivial size. They're pretty much exactly the same on this tiny dataset. An index on Table1(athlete, "date") would be required for this to perform optimally on a large data set.
; with CTE as
(
select row_number() over (partition by athlete order by date) rn
, *
from Table1
)
select *
from CTE cur
where not exists
(
select *
from CTE prev
where prev.place = 1
and prev.athlete = cur.athlete
and prev.rn between cur.rn - 3 and cur.rn
)
Live example at SQL Fiddle.

SQL max multiple columns

I am trying to display the maximum value of a specific value, and the corresponding timestamp for the value. I have the command working properly, but unfortunately, if the value is at the maximum value for more than one time period, it displays all of the timestamps. This can be cumbersome with multiple targets as well. Here is what I am using now:
select target_name,value,collection_timestamp
from (select target_name,value,collection_timestamp,
max(value) over (partition by target_name) max_value
from mgmt$metric_details
where target_type='host' and metric_name='TotalDiskUsage'
and column_label='Total Disk Utilized (%) (across all local filesystems)'
)
where value=max_value;
I want to utilize the same kind of command (trying to avoid inner joins etc, because of the lack of bandwidth)....but only show 1 max value/timestamp per target_name. Is there a way to coordinate a group by or limit function into this, without breaking it? I am somewhat unfamiliar with SQL, so this is all new territory.
Your query is so close. Instead of doing the max, do a row_number():
select target_name,value,collection_timestamp
from (select target_name,value,collection_timestamp,
row_number() over (partition by target_name order by value desc) as seqnum
from mgmt$metric_details
where target_type='host' and metric_name='TotalDiskUsage'
and column_label='Total Disk Utilized (%) (across all local filesystems)'
)
where seqnum = 1
This orders everything in the partition by value. You want the one largest value, so order by descending value and take the first in the sequence.
Use ROW_NUMBER() function instead of MAX() and appropriate ORDER BY in the window to resolve the ties:
select target_name,value,collection_timestamp
from (select target_name,value,collection_timestamp,
ROW_NUMBER() OVER (partition by target_name
ORDER BY value DESC,
collection_timestamp DESC )
AS rn
from mgmt$metric_details
where target_type='host' and metric_name='TotalDiskUsage'
and column_label='Total Disk Utilized (%) (across all local filesystems)'
)
where rn = 1 ;