Teradata: Recursively Subtract - sql

I have a set of data as follows:
Product Customer Sequence Amount
A 123 1 928.69
A 123 2 5032.81
A 123 3 6499.19
A 123 4 7908.57
What I want to do is recursively subtract the amounts based on the result of the previous subtraction (keeping the first amount as-is), into in a 'Result' column
e.g. Subtract 0 from 928.69 = 928.69, subtract 928.69 from 5032.81 = 4104.12, subtract 4104.12 from 6499.19 = 2395.07, etc (for each product/customer)
The results I'm trying to achieve are:
Product Customer Sequence Amount Result
A 123 1 928.69 928.69
A 123 2 5032.81 4104.12
A 123 3 6499.19 2395.07
A 123 4 7908.57 5513.50
I had been trying to achieve this using combinations of LEAD & LAG, but couldn't figure out how to use the result in the next row.
I'm thinking it's possible using a recursive statement, iterating over the sequence, however I'm not familiar with teradata recursion and couldn't successfully adapt the samples I found.
Can anyone please direct me on how to format a recursive teradata SQL statement to achieve the above result? I'm also open to non-recursive options if there are any.
CREATE VOLATILE TABLE MY_TEST (Product CHAR(1), Customer INTEGER, Sequence INTEGER, Amount DECIMAL(16,2)) ON COMMIT PRESERVE ROWS;
INSERT INTO MY_TEST VALUES ('A', 123, 1, 928.69);
INSERT INTO MY_TEST VALUES ('A', 123, 2, 5032.81);
INSERT INTO MY_TEST VALUES ('A', 123, 3, 6499.19);
INSERT INTO MY_TEST VALUES ('A', 123, 4, 7908.57);

This is really weird because of the alternation of the + and -.
If you know the value is always positive, then this works:
with t as (
select 1 as customer, 928.69 as amount, 928.69 as result union all
select 2, 5032.81, 4104.12 union all
select 3, 6499.19, 2395.07 union all
select 4, 7908.57, 5513.50
)
select t.*,
abs(sum( case when seqnum mod 2 = 1 then - amount else amount end ) over (partition by product order by sequence rows unbounded preceding)
from t;
The abs() is really a shortcut. If the resulting value could be negative, you can have an outer case expression to determine if the result should be multiplied by -1 or 1:
select t.*,
((case when sequence mod 2 = 1 then -1 else 1 end) *
sum( case when sequence mod 2 = 1 then - amount else amount end ) over (partition by product order by sequence rows unbounded preceding)
)
from t

select colA-der_col_A from table A,
(select coalesce(min(col_A) as der_col_A over (partition by col_B order by col_A rows between 1 following and 1 following), 0)
from table) B
on (A.col_b=B.Col_B);
Replace col_A and col_B with your key columns.Product,customer and sequence in your case.

Related

Generate Identifier for consecutive rows with same value

I'm trying to get an SQL Server query that needs partitioning in a way such that consecutive rows with the same Type value ordered by date have the same unique identifier.
Let's say I have the following table
declare #test table
(
CustomerId varchar(10),
Type int,
date datetime
)
insert into #test values ('aaaa', 1, '2015-10-24 22:52:47')
insert into #test values ('bbbb', 1, '2015-10-23 22:56:47')
insert into #test values ('cccc', 2, '2015-10-22 21:52:47')
insert into #test values ('dddd', 2, '2015-10-20 22:12:47')
insert into #test values ('aaaa', 1, '2015-10-19 20:52:47')
insert into #test values ('dddd', 2, '2015-10-18 12:52:47')
insert into #test values ('aaaa', 3, '2015-10-18 12:52:47')
I want my output column to be something like this (the numbers do not need to be ordered, all I need are unique identifiers for each group).
0
0
1
1
2
3
4
Explanation: first 2 rows have UD:0 because the both have a type "1", then the next row has a different type ("2") so it should be another identifier, UD:1 in this case, the following row still has the same type so the UD is the same, then the next one has a different type "1" so another identifier, in this case UD:2 and on and on.
The customerId column is irrelevant to the query, the condition should be based on the Type and Date column
My current almost does the trick but it fails in some cases giving the same ID to rows with different type values.
SELECT
ROW_NUMBER() OVER (ORDER BY date) -
ROW_NUMBER() OVER (PARTITION BY Type ORDER BY date)
FROM #TEST
This is a Gaps & Islands problem that is solved using the traditional solution.
For example:
select
*,
sum(inc) over(order by date desc, type) as grp
from (
select *,
case when type <> lag(type) over(order by date desc, type)
then 1 else 0 end as inc
from test
) x
order by date desc, type
Result:
CustomerId Type date inc grp
----------- ----- --------------------- ---- ---
aaaa 1 2015-10-24T22:52:47Z 0 0
bbbb 1 2015-10-23T22:56:47Z 0 0
cccc 2 2015-10-22T21:52:47Z 1 1
dddd 2 2015-10-20T22:12:47Z 0 1
aaaa 1 2015-10-19T20:52:47Z 1 2
dddd 2 2015-10-18T12:52:47Z 1 3
aaaa 3 2015-10-18T12:52:47Z 1 4
See example at SQL Fiddle.

Remove duplicates from single field only in rollup query

I have a table of data for individual audits on inventory. Every audit has a location, an expected value, a variance value, and some other data that aren't really important here.
I am writing a query for Cognos 11 which summarizes a week of these audits. Currently, it rolls everything up into sums by location class. My problem is that there may be multiple audits for individual locations and while I want the variance field to sum the data from all audits regardless of whether it's the first count on that location, I only want the expected value for distinct locations (i.e. only SUM expected value where the location is distinct).
Below is a simplified version of the query. Is this even possible or will I have to write a separate query in Cognos and make it two reports that will have to be combined after the fact? As you can likely tell, I'm fairly new to SQL and Cognos.
SELECT COALESCE(CASE
WHEN location_class = 'A'
THEN 'Active'
WHEN location_class = 'C'
THEN 'Active'
WHEN location_class IN (
'R'
,'0'
)
THEN 'Reserve'
END, 'Grand Total') "Row Labels"
,SUM(NVL(expected_cost, 0)) "Sum of Expected Cost"
,SUM(NVL(variance_cost, 0)) "Sum of Variance Cost"
,SUM(ABS(NVL(variance_cost, 0))) "Sum of Absolute Cost"
,COUNT(DISTINCT location) "Count of Locations"
,(SUM(NVL(variance_cost, 0)) / SUM(NVL(expected_cost, 0))) "Variance"
FROM audit_table
WHERE audit_datetime <= #prompt('EndDate') # audit_datetime >= #prompt('StartDate') #
GROUP BY ROLLUP(CASE
WHEN location_class = 'A'
THEN 'Active'
WHEN location_class = 'C'
THEN 'Active'
WHEN location_class IN (
'R'
,'0'
)
THEN 'Reserve'
END)
ORDER BY 1 ASC
This is what I'm hoping to end up with:
Thanks for any help!
Have you tried taking a look at the OVER clause in SQL? It allows you to use windowed functions within a result set such that you can get aggregates based on specific conditions. This would probably help since you seem to trying to get a summation of data based on a different grouping within a larger grouping.
For example, let's say we have the below dataset:
group1 group2 val dateadded
----------- ----------- ----------- -----------------------
1 1 1 2020-11-18
1 1 1 2020-11-20
1 2 10 2020-11-18
1 2 10 2020-11-20
2 3 100 2020-11-18
2 3 100 2020-11-20
2 4 1000 2020-11-18
2 4 1000 2020-11-20
Using a single query we can return both the sums of "val" over "group1" as well as the summation of the first (based on datetime) "val" records in "group2":
declare #table table (group1 int, group2 int, val int, dateadded datetime)
insert into #table values (1, 1, 1, getdate())
insert into #table values (1, 1, 1, dateadd(day, 1, getdate()))
insert into #table values (1, 2, 10, getdate())
insert into #table values (1, 2, 10, dateadd(day, 1, getdate()))
insert into #table values (2, 3, 100, getdate())
insert into #table values (2, 3, 100, dateadd(day, 1, getdate()))
insert into #table values (2, 4, 1000, getdate())
insert into #table values (2, 4, 1000, dateadd(day, 1, getdate()))
select t.group1, sum(t.val) as group1_sum, group2_first_val_sum
from #table t
inner join
(
select group1, sum(group2_first_val) as group2_first_val_sum
from
(
select group1, val as group2_first_val, row_number() over (partition by group2 order by dateadded) as rownumber
from #table
) y
where rownumber = 1
group by group1
) x on t.group1 = x.group1
group by t.group1, x.group2_first_val_sum
This returns the below result set:
group1 group1_sum group2_first_val_sum
----------- ----------- --------------------
1 22 11
2 2200 1100
The most inner subquery in the joined table numbers the rows in the data set based on "group2", resulting in the records either having a "1" or a "2" in the "rownum" column since there's only 2 records in each "group2".
The next subquery takes that data and filters out any rows that are not the first (rownum = 1) and sums the "val" data.
The main query gets the sum of "val" in each "group1" from the main table and then joins on the subqueried table to get the "val" sum of only the first records in each "group2".
There are more efficient ways to write this such as moving the summation of the "group1" values to a subquery in the SELECT statement to get rid of one of the nested tabled subqueries, but I wanted to show how to do it without subqueries in the SELECT statement.
Have you tried to put the distinct at the bottom like this ?
(SUM(NVL(variance_cost,0)) / SUM(NVL(expected_cost,0))) "Variance",
COUNT(DISTINCT location) "Count of Locations"
FROM audit_table

sql make geometric sequence from series of bit values

I have this table:
declare #Table table (value int)
insert #Table select 0
insert #Table select 1
insert #Table select 1
insert #Table select 1
insert #Table select 0
insert #Table select 1
insert #Table select 1
Now, I need to make a Select query, which would add a column. This column will make a geometric sequence once there is a serie of value 1 in column value.
This would be the result:
I would phrase this as an arithmetic problem. First, you problem suggests that the ordering of rows is important. Hence, you need a column to specify the ordering. I assume there is an id column with this information.
Then to create the groups where the sequences start, do a cumulative sum of the 0s -- all the 1 are in the same group. Given the data you can express this as sum(1 - value) over (order by id).
Then just use arithmetic:
select t.*,
value * power(2, row_number() over (partition by grp order by id) - 1) as generatedsequence
from (select t.*, sum(1 - value) over (order by id) as grp
from #table t
) t;
Here is a db<>fiddle.
The arithmetic is that you want to enumerate the values in the group and then raise 2 to that power (except when value is 0). So the subquery returns:
id. value grp
1 1 1
2 1 1
3 1 1
4 1 1
5 0 2
6 1 2
7 1 2
The row_number() then enumerates the values within each grp.
OK.. first things first, in a database there is no inherent ordering of the data within a table. Therefore, to do what you want, you will need to make a field to sort/order on. In this case, I'm using an IDENTITY field called 'SortID'.
CREATE TABLE #Table (SortID int IDENTITY(1,1), BitValue bit);
INSERT INTO #Table (BitValue)
VALUES (0), (1), (1), (1), (0), (1), (1);
This gives a table with the following starting data
SortID BitValue
1 0
2 1
3 1
4 1
5 0
6 1
7 1
Now, to solve the problem
One way to do it is via a recursive CTE - where the value of the current row is based on the values of the previous rows.
However, recursive CTEs can have performance issues (they're loops, basically) so it's better to do a set-based approach if possible.
In this case, as you want a geometric sequence which is 2 to the power of the relevant row number, we don't need the previous rows to calculate this row - we only need to know the row number
The following approach
Uses a CTE to make a new field called 'GroupNum' which is used to group the rows together. Every time a row has a BitValue of 0, it increments the GroupNum by 1.
In your example, the first four rows would have GroupNum = 1, the remaining three would have GroupNum = 2
Follows the above with a window function - partitioning by those group numbers, and getting the row_number (minus one) within each group.
The final result is set as the power of a variable #a to the relevant row_number.
To match your example, I have used #a = 2 as the base for the POWER function.
DECLARE #a int;
SET #a = 2;
WITH Grouped_BitValues AS
(SELECT SortID, BitValue,
CASE WHEN BitValue = 0 THEN 1 ELSE 0 END AS NewGrpFlag,
SUM(CASE WHEN BitValue = 0 THEN 1 ELSE 0 END) OVER (ORDER BY SortID) AS GroupNum
FROM #Table
)
SELECT BitValue, POWER(#a, ROW_NUMBER() OVER (PARTITION BY GroupNum ORDER BY SortID) -1) AS Geometric_Sequence
FROM Grouped_BitValues
ORDER BY SortID;
And here are the results
BitValue Geometric_Sequence
0 1
1 2
1 4
1 8
0 1
1 2
1 4
Note that in your question, 2^0 should be 1, not 0, for a proper geometric sequence. If instead you wanted 0, you'd need to code in Geometric_Sequence to have a CASE expression (e.g., CASE WHEN BitValue = 0 THEN 0 ELSE POWER(...) AS Geometric_Sequence).
Here is a db<>fiddle with
the setup
the answer
the components of the answer (e.g., the CTE, and calculations) to demonstrate how it's calculated

Count length of consecutive duplicate values for each id

I have a table as shown in the screenshot (first two columns) and I need to create a column like the last one. I'm trying to calculate the length of each sequence of consecutive values for each id.
For this, the last column is required. I played around with
row_number() over (partition by id, value)
but did not have much success, since the circled number was (quite predictably) computed as 2 instead of 1.
Please help!
First of all, we need to have a way to defined how the rows are ordered. For example, in your sample data there is not way to be sure that 'first' row (1, 1) will be always displayed before the 'second' row (1,0).
That's why in my sample data I have added an identity column. In your real case, the details can be order by row ID, date column or something else, but you need to ensure the rows can be sorted via unique criteria.
So, the task is pretty simple:
calculate trigger switch - when value is changed
calculate groups
calculate rows
That's it. I have used common table expression and leave all columns in order to be easy for you to understand the logic. You are free to break this in separate statements and remove some of the columns.
DECLARE #DataSource TABLE
(
[RowID] INT IDENTITY(1, 1)
,[ID]INT
,[value] INT
);
INSERT INTO #DataSource ([ID], [value])
VALUES (1, 1)
,(1, 0)
,(1, 0)
,(1, 1)
,(1, 1)
,(1, 1)
--
,(2, 0)
,(2, 1)
,(2, 0)
,(2, 0);
WITH DataSourceWithSwitch AS
(
SELECT *
,IIF(LAG([value]) OVER (PARTITION BY [ID] ORDER BY [RowID]) = [value], 0, 1) AS [Switch]
FROM #DataSource
), DataSourceWithGroup AS
(
SELECT *
,SUM([Switch]) OVER (PARTITION BY [ID] ORDER BY [RowID]) AS [Group]
FROM DataSourceWithSwitch
)
SELECT *
,ROW_NUMBER() OVER (PARTITION BY [ID], [Group] ORDER BY [RowID]) AS [GroupRowID]
FROM DataSourceWithGroup
ORDER BY [RowID];
You want results that are dependent on actual data ordering in the data source. In SQL you operate on relations, sometimes on ordered set of relations rows. Your desired end result is not well-defined in terms of SQL, unless you introduce an additional column in your source table, over which your data is ordered (e.g. auto-increment or some timestamp column).
Note: this answers the original question and doesn't take into account additional timestamp column mentioned in the comment. I'm not updating my answer since there is already an accepted answer.
One way to solve it could be through a recursive CTE:
create table #tmp (i int identity,id int, value int, rn int);
insert into #tmp (id,value) VALUES
(1,1),(1,0),(1,0),(1,1),(1,1),(1,1),
(2,0),(2,1),(2,0),(2,0);
WITH numbered AS (
SELECT i,id,value, 1 seq FROM #tmp WHERE i=1 UNION ALL
SELECT a.i,a.id,a.value, CASE WHEN a.id=b.id AND a.value=b.value THEN b.seq+1 ELSE 1 END
FROM #tmp a INNER JOIN numbered b ON a.i=b.i+1
)
SELECT * FROM numbered -- OPTION (MAXRECURSION 1000)
This will return the following:
i id value seq
1 1 1 1
2 1 0 1
3 1 0 2
4 1 1 1
5 1 1 2
6 1 1 3
7 2 0 1
8 2 1 1
9 2 0 1
10 2 0 2
See my little demo here: https://rextester.com/ZZEIU93657
A prerequisite for the CTE to work is a sequenced table (e. g. a table with an identitycolumn in it) as a source. In my example I introduced the column i for this. As a starting point I need to find the first entry of the source table. In my case this was the entry with i=1.
For a longer source table you might run into a recursion-limit error as the default for MAXRECURSION is 100. In this case you should uncomment the OPTION setting behind my SELECT clause above. You can either set it to a higher value (like shown) or switch it off completely by setting it to 0.
IMHO, this is easier to do with cursor and loop.
may be there is a way to do the job with selfjoin
declare #t table (id int, val int)
insert into #t (id, val)
select 1 as id, 1 as val
union all select 1, 0
union all select 1, 0
union all select 1, 1
union all select 1, 1
union all select 1, 1
;with cte1 (id , val , num ) as
(
select id, val, row_number() over (ORDER BY (SELECT 1)) as num from #t
)
, cte2 (id, val, num, N) as
(
select id, val, num, 1 from cte1 where num = 1
union all
select t1.id, t1.val, t1.num,
case when t1.id=t2.id and t1.val=t2.val then t2.N + 1 else 1 end
from cte1 t1 inner join cte2 t2 on t1.num = t2.num + 1 where t1.num > 1
)
select * from cte2

SQL Server Sum a specific number of rows based on another column

Here are the important columns in my table
ItemId RowID CalculatedNum
1 1 3
1 2 0
1 3 5
1 4 25
1 5 0
1 6 8
1 7 14
1 8 2
.....
The rowID increments to 141 before the ItemID increments to 2. This cycle repeats for about 122 million rows.
I need to SUM the CalculatedNum field in groups of 6. So sum 1-6, then 7-12, etc. I know I end up with an odd number at the end. I can discard the last three rows (numbers 139, 140 and 141). I need it to start the SUM cycle again when I get to the next ItemID.
I know I need to group by the ItemID but I am having trouble trying to figure out how to get SQL to SUM just 6 CalculatedNum's at a time. Everything else I have come across SUMs based on a column where the values are the same.
I did find something on Microsoft's site that used the ROW_NUMBER function but I couldn't quite make sense of it. Please let me know if this question is not clear.
Thank you
You need to group by (RowId - 1) / 6 and ItemId. Like this:
drop table if exists dbo.Items;
create table dbo.Items (
ItemId int
, RowId int
, CalculatedNum int
);
insert into dbo.Items (ItemId, RowId, CalculatedNum)
values (1, 1, 3), (1, 2, 0), (1, 3, 5), (1, 4, 25)
, (1, 5, 0), (1, 6, 8), (1, 7, 14), (1, 8, 2);
select
tt.ItemId
, sum(tt.CalculatedNum) as CalcSum
from (
select
*
, (t.RowId - 1) / 6 as Grp
from dbo.Items t
) tt
group by tt.ItemId, tt.Grp
You could use integer division and group by.
SELECT ItemId, (RowId-1)/6 as Batch, sum(CalculatedNum)
FROM your_table GROUP BY ItemId, Batch
To discard incomplete batches:
SELECT ItemId, (RowId-1)/6 as Batch, sum(CalculatedNum), count(*) as Cnt
FROM your_table GROUP BY ItemId, Batch HAVING Cnt = 6
EDIT: Fix an off by one error.
To ensure you're querying 6 rows at a time you can try to use the modulo function : https://technet.microsoft.com/fr-fr/library/ms173482(v=sql.110).aspx
Hope this can help.
Thanks everyone. This was really helpful.
Here is what we ended up with.
SELECT ItemID, MIN(RowID) AS StartingRow, SUM(CalculatedNum)
FROM dbo.table
GROUP BY ItemID, (RowID - 1) / 6
ORDER BY ItemID, StartingRow
I am not sure why it did not like the integer division in the select statement but I checked the results against a sample of the data and the math is correct.