Finding the Nth previous row based on criteria - dynamic LAG - sql

I have this SQL challenge
I have a table which looks like that (I've just got a message from the site saying I can't post the image right here, so please use the link)
The challenge is to identify the nearest N-th previous row where the value of NEW_BATCH_FLAG = 1 and to "spread" the value of "date of current transaction" of THAT row for all the following rows where NEW_BATCH_FLAG = 0 (till we'll meet the next row with NEW_BATCH_FLAG = 1 )
As you can (hopefully :)) see in the image, the first row has NEW_BATCH_FLAG = 1 so I should "spread" it's date (1-jun-2020) for the 2 next rows where NEW_BATCH_FLAG = 0, but the problem is that for the first one i have to go one row back, and for the second one - 2 rows back.
So the challenge is to calculate for each given row - how many rows I have to go back till I hit the nearest NEW_BATCH_FLAG = 1 row.
This "distance" will be further used in the LAG function as the argument for offset.

One method is to use a scalar sub-query, but it won't be very efficient:
select t1.transaction_date,
t1.new_batch_flag,
case new_batch_flag
when 1 then t1.transaction_date
else (select max(t2.transaction_date)
from the_table t2
where t2.new_batch_flag = 1
and t2.transaction_date < t1.transaction_date)
end as start_date_of_batch
from the_table t1;
Online example
With Postgres I would do it like this:
select transaction_date,
new_batch_flag,
case new_batch_flag
when 1 then transaction_date
else max(transaction_date) filter (where new_batch_flag = 1) over (order by transaction_date)
end as start_of_batch
from the_table;
Online example

I think you just want a cumulative max with some conditional logic:
select t.*,
max(case when new_batch_flag = 1 then transaction_date end) over (order by transaction_date) as start_date_of_batch
from t;

Related

How to create a case statement which groups fields?

I am trying to understand how to group values together to add an indicator. I want to 'fix' the values and based on this, attribute an indicator.
The values I am trying to group are date, customer name and product type to create an indicator which captures what kind of order was placed (fruit only, fruit and vegetable, vegetable only). The goal is to calculate the total volume of each kind of order placed. The data is set out like this, and the column I am trying to create is the 'Order Type.
What I have done so far:
I originally completed this analysis in Tableau ]where I was able to use the 'Fixed' function and sum the value of indicators (for fruit or veggie) to determine each order type individually.
I have written case statements to identify the product type, with the idea that I could sum this to determine order type (code below) however this did not work as I only need one instance of the indicator for each order. To solve this, I have written a case statement which partitions the fields and orders by date to get one instance of an indicator for each order.
Case Statements
CASE WHEN Product_Type = 'Fruit' THEN 1 ELSE 0 END AS Fruit_Indicator
, CASE WHEN Product_Type = 'Vegetable' THEN 1 ELSE 0 END AS Veg_Indicator
Case Statement with partition by and order by
, CASE WHEN ROW_NUMBER() OVER (PARTITION BY Order_Date, Customer ORDER BY Order_Date ASC) = 1 AND Product_Type = 'Fruit' THEN 1 ELSE NULL END AS Fruit_Ind
, CASE WHEN ROW_NUMBER() OVER (PARTITION BY Order_Date, Customer ORDER BY Order_Date ASC) = 1 AND Product_Type = 'Vegetable' THEN 1 ELSE NULL END AS Veg_Ind
I would appreciate any guidance on the right direction.
Thanks!
It APPEARS you are trying to get data grouped by date such as Mar 21, Mar 22, etc... So, you may want to have a secondary query to join the primary data from. The second query will be an aggregate by customer and date. If the date field is date/time oriented, you will have to adjust the group by to get proper formatted context such as date-format using month/day/year and ignoring any time component. This might also be handled by a function to just get the date-part and ignoring the time. Then, your original data to the aggregate should get you what you need. Maybe something like.
select
yt.date,
yt.customer,
yt.product,
yt.productType,
case when PreQuery.IsFruit > 0 and PreQuery.IsVegetable > 0
then 'Fruit & Vegetable'
when PreQuery.IsFruit > 0 and PreQuery.IsVegetable = 0
then 'Fruit Only'
when PreQuery.IsFruit = 0 and PreQuery.IsVegetable > 0
then 'Vegetable Only' end OrderType
from
YourTable yt
JOIN
( select
yt2.customer,
yt2.date,
max( case when yt2.ProductType = 'Fruit'
then 1 else 0 end ) IsFruit,
max( case when yt2.ProductType = 'Vegetable'
then 1 else 0 end ) IsVegetable
from
YourTable yt2
-- if you want to restrict time period, add a where
-- clause here on the date range as to not query entire table
group by
yt2.customer,
yt2.date ) PreQuery
ON yt.customer = PreQuery.customer
AND yt.date = PreQuery.date
-- same here for your outer query to limit just date range in question.
-- if you want to restrict time period, add a where
-- clause here on the date range as to not query entire table
order by
yt.date,
yt.customer,
yt.product

SQL Select row depending on values in different columns

I've already found so many answers here but now I can't seem to find any to my specific problem.
I can't figure out how to select a value from a row depending on the value in different columns
with the below table, I want to achieve the following results.
in case the value in column stdvpuni = 1 then return values / contents from this row for the article (column art).
in case the value in column stdvpuni = 0 then return values / contents from the row where STDUNIABG = 1 for this article (column art).
You seem to want one row part art, based on the content of other rows. That suggests using row_number():
select t.*
from (select t.*,
row_number() over (partition by art order by stdvpuni desc, STDUNIABG desc) as seqnum
from t
) t
where seqnum = 1;
You don't specify what to do if neither column is 1. You might want a where clause (where 1 in (stdvpuni, STDUNIABG)) or another condition in the order by.
I do not know what values / contents is, but I suppose that's easy for you to figure out. So, I will focus on the way to select this:
SELECT
CASE
WHEN current.stdvpuni = 1 THEN 'values / contents of current row'
ELSE 'values / contents of other row'
END
FROM yourtable current
JOIN yourtable other
ON other.stdvpuni = 1;
Use your conditions with NOT EXISTS in the WHERE clause:
SELECT t1.*
FROM tablename t1
WHERE t1.STDVPUNI = 1
OR (
t1.STDVPUNI = 0 AND t1.STDUNIABG = 1
AND NOT EXISTS (SELECT 1 FROM tablename t2 WHERE t2.ART = t1.ART AND t2.STDVPUNI = 1)
);

Mark values with diffreent tag in sql

I have one endpoint that is 7. I would like to few numbers 40,35,30,26,22,18,12 mark as completed.(This is an example. The value may be different) and few numbers 13,17,21,27,32,38,43 mark as pending. (This is an example. The value may be different) Can we achieve by SQL statement? for number details, please find the image.
If your DBMS supports Windowed Aggregates:
with cte as
( select ID, point,
-- find all rows after the latest 7 row
sum(case when point = 7 then 1 end)
over (order by ID DESC) as cumsum
from tab
)
select ID, point,
case when point = 7 then 'endpoint'
when cumsum is null then 'pending' -- no 7 after those IDs
else 'completed'
end
from cte
If you want everything before the first "7" as "completed" and the rest as "pending", then you can use window functions and cumulative logic. One method is:
select t.*,
(case when point = 7 then null
when id < min(case when point = 7 then id end) over ()
then 'complete'
else 'pending'
end) as mark
from t ;

Create a new table with columns with case statements and max function

I have some problems in creating a new table from an old one with new columns defined by case statements.
I need to add to a new table three columns, where I compute the maximum based on different conditions. Specifically,
if time is between 1 and 3, I define a variable max_var_1_3 as max((-1)*var),
if time is between 1 and 6, I define a variable max_var_1_6 as max((-1)*var),
if time is between 1 and 12, I define a variable max_var_1_12 as max((-1)*var),
The max function needs to take the maximum value of the variable var in the window between 1 and 3, 1 and 6, 1 and 12 respectively.
I wrote this
create table new as(
select t1.*,
(case when time between 1 and 3 then MAX((-1)*var)
else var
end) as max_var_1_3,
(case when time between 1 and 6 then MAX((-1)*var)
else var
end) as max_var_1_6,
(case when time between 1 and 12 then MAX((-1)*var)
else var
end) as max_var_1_12
from old_table t1
group by time
) with data primary index time
but unfortunately it is not working. The old_table has already some columns, and I would like to import all of them and then compare the old table with the new one. I got an error that says that should be something between ) and ',', but I cannot understand what. I am using Teradata SQL.
Could you please help me?
Many thanks
The problem is that you have GROUP BY time in your query while trying to return all the other values with your SELECT t1.*. To make your query work as-is, you'd need to add each column from t1.* to your GROUP BY clause.
If you want to find the MAX value within the different time ranges AND also return all the rows, then you can use a window function. Something like this:
CREATE TABLE new AS (
SELECT
t1.*,
CASE
WHEN t1.time BETWEEN 1 AND 3 THEN (
MAX(CASE WHEN t1.time BETWEEN 1 AND 3 THEN (-1 * t1.var) ELSE NULL END) OVER()
)
ELSE t1.var
END AS max_var_1_3,
CASE
WHEN t1.time BETWEEN 1 AND 6 THEN (
MAX(CASE WHEN t1.time BETWEEN 1 AND 6 THEN (-1 * t1.var) ELSE NULL END) OVER()
)
ELSE t1.var
END AS max_var_1_6,
CASE
WHEN t1.time BETWEEN 1 AND 12 THEN (
MAX(CASE WHEN t1.time BETWEEN 1 AND 12 THEN (-1 * t1.var) ELSE NULL END) OVER()
)
ELSE t1.var
END AS max_var_1_12,
FROM old_table t1
) WITH DATA PRIMARY INDEX (time)
;
Here's the logic:
check if a row falls in the range
if it does, return the desired MAX value for rows in that range
otherwise, just return that given row's default value (var)
return all rows along with the three new columns
If you have performance issues, you could also move the max_var calculations to a CTE, since they only need to be calculated once. Also to avoid confusion, you may want to explicitly specify the values in your SELECT instead of using t1.*.
I don't have a TD system to test, but try it out and see if that works.
I cannot help with the CREATE TABLE AS, but the query you want is this:
SELECT
t.*,
(SELECT MAX(-1 * var) FROM old_table WHERE time BETWEEN 1 AND 3) AS max_var_1_3,
(SELECT MAX(-1 * var) FROM old_table WHERE time BETWEEN 1 AND 6) AS max_var_1_6,
(SELECT MAX(-1 * var) FROM old_table WHERE time BETWEEN 1 AND 12) AS max_var_1_12
FROM old_table t;

Find the Row which contain only row with zero and One for the Particular data

I have data where group will one row as zero and one and for the same data value will give one and two.
I have tried with below code .which seems to be not working
select *
from (select livecasnum, flag,
DENSE_RANK()over (partition by livecasnum order by flag) as Ranks
from TblcaseFlag
group by livecasnum, flag
) b
group by livecasnum,flag,Ranks
having count(flag + Ranks) = 1 and flag <> 1
I need only like data one row which having only zero and one ex: 99149
Why not use not exists instead :
select tf.*
from TblcaseFlag tf
where tf.flag = 0 and
not exists (select 1
from TblcaseFlag tf1
where tf.livecasnum = tf1.livecasnum and
tf1.flag = 1
);