Generate Identifier for consecutive rows with same value - sql

I'm trying to get an SQL Server query that needs partitioning in a way such that consecutive rows with the same Type value ordered by date have the same unique identifier.
Let's say I have the following table
declare #test table
(
CustomerId varchar(10),
Type int,
date datetime
)
insert into #test values ('aaaa', 1, '2015-10-24 22:52:47')
insert into #test values ('bbbb', 1, '2015-10-23 22:56:47')
insert into #test values ('cccc', 2, '2015-10-22 21:52:47')
insert into #test values ('dddd', 2, '2015-10-20 22:12:47')
insert into #test values ('aaaa', 1, '2015-10-19 20:52:47')
insert into #test values ('dddd', 2, '2015-10-18 12:52:47')
insert into #test values ('aaaa', 3, '2015-10-18 12:52:47')
I want my output column to be something like this (the numbers do not need to be ordered, all I need are unique identifiers for each group).
0
0
1
1
2
3
4
Explanation: first 2 rows have UD:0 because the both have a type "1", then the next row has a different type ("2") so it should be another identifier, UD:1 in this case, the following row still has the same type so the UD is the same, then the next one has a different type "1" so another identifier, in this case UD:2 and on and on.
The customerId column is irrelevant to the query, the condition should be based on the Type and Date column
My current almost does the trick but it fails in some cases giving the same ID to rows with different type values.
SELECT
ROW_NUMBER() OVER (ORDER BY date) -
ROW_NUMBER() OVER (PARTITION BY Type ORDER BY date)
FROM #TEST

This is a Gaps & Islands problem that is solved using the traditional solution.
For example:
select
*,
sum(inc) over(order by date desc, type) as grp
from (
select *,
case when type <> lag(type) over(order by date desc, type)
then 1 else 0 end as inc
from test
) x
order by date desc, type
Result:
CustomerId Type date inc grp
----------- ----- --------------------- ---- ---
aaaa 1 2015-10-24T22:52:47Z 0 0
bbbb 1 2015-10-23T22:56:47Z 0 0
cccc 2 2015-10-22T21:52:47Z 1 1
dddd 2 2015-10-20T22:12:47Z 0 1
aaaa 1 2015-10-19T20:52:47Z 1 2
dddd 2 2015-10-18T12:52:47Z 1 3
aaaa 3 2015-10-18T12:52:47Z 1 4
See example at SQL Fiddle.

Related

Remove duplicates from single field only in rollup query

I have a table of data for individual audits on inventory. Every audit has a location, an expected value, a variance value, and some other data that aren't really important here.
I am writing a query for Cognos 11 which summarizes a week of these audits. Currently, it rolls everything up into sums by location class. My problem is that there may be multiple audits for individual locations and while I want the variance field to sum the data from all audits regardless of whether it's the first count on that location, I only want the expected value for distinct locations (i.e. only SUM expected value where the location is distinct).
Below is a simplified version of the query. Is this even possible or will I have to write a separate query in Cognos and make it two reports that will have to be combined after the fact? As you can likely tell, I'm fairly new to SQL and Cognos.
SELECT COALESCE(CASE
WHEN location_class = 'A'
THEN 'Active'
WHEN location_class = 'C'
THEN 'Active'
WHEN location_class IN (
'R'
,'0'
)
THEN 'Reserve'
END, 'Grand Total') "Row Labels"
,SUM(NVL(expected_cost, 0)) "Sum of Expected Cost"
,SUM(NVL(variance_cost, 0)) "Sum of Variance Cost"
,SUM(ABS(NVL(variance_cost, 0))) "Sum of Absolute Cost"
,COUNT(DISTINCT location) "Count of Locations"
,(SUM(NVL(variance_cost, 0)) / SUM(NVL(expected_cost, 0))) "Variance"
FROM audit_table
WHERE audit_datetime <= #prompt('EndDate') # audit_datetime >= #prompt('StartDate') #
GROUP BY ROLLUP(CASE
WHEN location_class = 'A'
THEN 'Active'
WHEN location_class = 'C'
THEN 'Active'
WHEN location_class IN (
'R'
,'0'
)
THEN 'Reserve'
END)
ORDER BY 1 ASC
This is what I'm hoping to end up with:
Thanks for any help!
Have you tried taking a look at the OVER clause in SQL? It allows you to use windowed functions within a result set such that you can get aggregates based on specific conditions. This would probably help since you seem to trying to get a summation of data based on a different grouping within a larger grouping.
For example, let's say we have the below dataset:
group1 group2 val dateadded
----------- ----------- ----------- -----------------------
1 1 1 2020-11-18
1 1 1 2020-11-20
1 2 10 2020-11-18
1 2 10 2020-11-20
2 3 100 2020-11-18
2 3 100 2020-11-20
2 4 1000 2020-11-18
2 4 1000 2020-11-20
Using a single query we can return both the sums of "val" over "group1" as well as the summation of the first (based on datetime) "val" records in "group2":
declare #table table (group1 int, group2 int, val int, dateadded datetime)
insert into #table values (1, 1, 1, getdate())
insert into #table values (1, 1, 1, dateadd(day, 1, getdate()))
insert into #table values (1, 2, 10, getdate())
insert into #table values (1, 2, 10, dateadd(day, 1, getdate()))
insert into #table values (2, 3, 100, getdate())
insert into #table values (2, 3, 100, dateadd(day, 1, getdate()))
insert into #table values (2, 4, 1000, getdate())
insert into #table values (2, 4, 1000, dateadd(day, 1, getdate()))
select t.group1, sum(t.val) as group1_sum, group2_first_val_sum
from #table t
inner join
(
select group1, sum(group2_first_val) as group2_first_val_sum
from
(
select group1, val as group2_first_val, row_number() over (partition by group2 order by dateadded) as rownumber
from #table
) y
where rownumber = 1
group by group1
) x on t.group1 = x.group1
group by t.group1, x.group2_first_val_sum
This returns the below result set:
group1 group1_sum group2_first_val_sum
----------- ----------- --------------------
1 22 11
2 2200 1100
The most inner subquery in the joined table numbers the rows in the data set based on "group2", resulting in the records either having a "1" or a "2" in the "rownum" column since there's only 2 records in each "group2".
The next subquery takes that data and filters out any rows that are not the first (rownum = 1) and sums the "val" data.
The main query gets the sum of "val" in each "group1" from the main table and then joins on the subqueried table to get the "val" sum of only the first records in each "group2".
There are more efficient ways to write this such as moving the summation of the "group1" values to a subquery in the SELECT statement to get rid of one of the nested tabled subqueries, but I wanted to show how to do it without subqueries in the SELECT statement.
Have you tried to put the distinct at the bottom like this ?
(SUM(NVL(variance_cost,0)) / SUM(NVL(expected_cost,0))) "Variance",
COUNT(DISTINCT location) "Count of Locations"
FROM audit_table

sql make geometric sequence from series of bit values

I have this table:
declare #Table table (value int)
insert #Table select 0
insert #Table select 1
insert #Table select 1
insert #Table select 1
insert #Table select 0
insert #Table select 1
insert #Table select 1
Now, I need to make a Select query, which would add a column. This column will make a geometric sequence once there is a serie of value 1 in column value.
This would be the result:
I would phrase this as an arithmetic problem. First, you problem suggests that the ordering of rows is important. Hence, you need a column to specify the ordering. I assume there is an id column with this information.
Then to create the groups where the sequences start, do a cumulative sum of the 0s -- all the 1 are in the same group. Given the data you can express this as sum(1 - value) over (order by id).
Then just use arithmetic:
select t.*,
value * power(2, row_number() over (partition by grp order by id) - 1) as generatedsequence
from (select t.*, sum(1 - value) over (order by id) as grp
from #table t
) t;
Here is a db<>fiddle.
The arithmetic is that you want to enumerate the values in the group and then raise 2 to that power (except when value is 0). So the subquery returns:
id. value grp
1 1 1
2 1 1
3 1 1
4 1 1
5 0 2
6 1 2
7 1 2
The row_number() then enumerates the values within each grp.
OK.. first things first, in a database there is no inherent ordering of the data within a table. Therefore, to do what you want, you will need to make a field to sort/order on. In this case, I'm using an IDENTITY field called 'SortID'.
CREATE TABLE #Table (SortID int IDENTITY(1,1), BitValue bit);
INSERT INTO #Table (BitValue)
VALUES (0), (1), (1), (1), (0), (1), (1);
This gives a table with the following starting data
SortID BitValue
1 0
2 1
3 1
4 1
5 0
6 1
7 1
Now, to solve the problem
One way to do it is via a recursive CTE - where the value of the current row is based on the values of the previous rows.
However, recursive CTEs can have performance issues (they're loops, basically) so it's better to do a set-based approach if possible.
In this case, as you want a geometric sequence which is 2 to the power of the relevant row number, we don't need the previous rows to calculate this row - we only need to know the row number
The following approach
Uses a CTE to make a new field called 'GroupNum' which is used to group the rows together. Every time a row has a BitValue of 0, it increments the GroupNum by 1.
In your example, the first four rows would have GroupNum = 1, the remaining three would have GroupNum = 2
Follows the above with a window function - partitioning by those group numbers, and getting the row_number (minus one) within each group.
The final result is set as the power of a variable #a to the relevant row_number.
To match your example, I have used #a = 2 as the base for the POWER function.
DECLARE #a int;
SET #a = 2;
WITH Grouped_BitValues AS
(SELECT SortID, BitValue,
CASE WHEN BitValue = 0 THEN 1 ELSE 0 END AS NewGrpFlag,
SUM(CASE WHEN BitValue = 0 THEN 1 ELSE 0 END) OVER (ORDER BY SortID) AS GroupNum
FROM #Table
)
SELECT BitValue, POWER(#a, ROW_NUMBER() OVER (PARTITION BY GroupNum ORDER BY SortID) -1) AS Geometric_Sequence
FROM Grouped_BitValues
ORDER BY SortID;
And here are the results
BitValue Geometric_Sequence
0 1
1 2
1 4
1 8
0 1
1 2
1 4
Note that in your question, 2^0 should be 1, not 0, for a proper geometric sequence. If instead you wanted 0, you'd need to code in Geometric_Sequence to have a CASE expression (e.g., CASE WHEN BitValue = 0 THEN 0 ELSE POWER(...) AS Geometric_Sequence).
Here is a db<>fiddle with
the setup
the answer
the components of the answer (e.g., the CTE, and calculations) to demonstrate how it's calculated

Teradata: Recursively Subtract

I have a set of data as follows:
Product Customer Sequence Amount
A 123 1 928.69
A 123 2 5032.81
A 123 3 6499.19
A 123 4 7908.57
What I want to do is recursively subtract the amounts based on the result of the previous subtraction (keeping the first amount as-is), into in a 'Result' column
e.g. Subtract 0 from 928.69 = 928.69, subtract 928.69 from 5032.81 = 4104.12, subtract 4104.12 from 6499.19 = 2395.07, etc (for each product/customer)
The results I'm trying to achieve are:
Product Customer Sequence Amount Result
A 123 1 928.69 928.69
A 123 2 5032.81 4104.12
A 123 3 6499.19 2395.07
A 123 4 7908.57 5513.50
I had been trying to achieve this using combinations of LEAD & LAG, but couldn't figure out how to use the result in the next row.
I'm thinking it's possible using a recursive statement, iterating over the sequence, however I'm not familiar with teradata recursion and couldn't successfully adapt the samples I found.
Can anyone please direct me on how to format a recursive teradata SQL statement to achieve the above result? I'm also open to non-recursive options if there are any.
CREATE VOLATILE TABLE MY_TEST (Product CHAR(1), Customer INTEGER, Sequence INTEGER, Amount DECIMAL(16,2)) ON COMMIT PRESERVE ROWS;
INSERT INTO MY_TEST VALUES ('A', 123, 1, 928.69);
INSERT INTO MY_TEST VALUES ('A', 123, 2, 5032.81);
INSERT INTO MY_TEST VALUES ('A', 123, 3, 6499.19);
INSERT INTO MY_TEST VALUES ('A', 123, 4, 7908.57);
This is really weird because of the alternation of the + and -.
If you know the value is always positive, then this works:
with t as (
select 1 as customer, 928.69 as amount, 928.69 as result union all
select 2, 5032.81, 4104.12 union all
select 3, 6499.19, 2395.07 union all
select 4, 7908.57, 5513.50
)
select t.*,
abs(sum( case when seqnum mod 2 = 1 then - amount else amount end ) over (partition by product order by sequence rows unbounded preceding)
from t;
The abs() is really a shortcut. If the resulting value could be negative, you can have an outer case expression to determine if the result should be multiplied by -1 or 1:
select t.*,
((case when sequence mod 2 = 1 then -1 else 1 end) *
sum( case when sequence mod 2 = 1 then - amount else amount end ) over (partition by product order by sequence rows unbounded preceding)
)
from t
select colA-der_col_A from table A,
(select coalesce(min(col_A) as der_col_A over (partition by col_B order by col_A rows between 1 following and 1 following), 0)
from table) B
on (A.col_b=B.Col_B);
Replace col_A and col_B with your key columns.Product,customer and sequence in your case.

T-SQL Select to compute a result row on preceeding group/condition

How to achieve this result using a T-SQL select query.
Given this sample table :
create table sample (a int, b int)
insert into sample values (999, 10)
insert into sample values (16, 11)
insert into sample values (10, 12)
insert into sample values (25, 13)
insert into sample values (999, 20)
insert into sample values (14, 12)
insert into sample values (90, 45)
insert into sample values (18, 34)
I'm trying to achieve this output:
a b result
----------- ----------- -----------
999 10 10
16 11 10
10 12 10
25 13 10
999 20 20
14 12 20
90 45 20
18 34 20
The rule is fairly simple: if column 'a' has the special value of 999 the result for that row and following rows (unless the value of 'a' is again 999) will be the value of column 'b'. Assume the first record will have 999 on column 'a'.
Any hint how to implement, if possible, the select query without using a stored procedure or function?
Thank you.
António
You can do what you want if you add a column to specify the ordering:
create table sample (
id int identity(1, 1),
a int,
b int
);
Then you can do what you want by finding the "999" version that is most recent and copying that value. Here is a method using window functions:
select a, b, max(case when a = 999 then b end) over (partition by id_999) as result
from (select s.*,
max(case when a = 999 then id end) over (order by id) as id_999
from sample s
) s;
You need to have an id column
select cn.id, cn.a
, (select top (1) b from sample where sample.id <= cn.id and a = 999 order by id desc)
from sample as cn
order by id

Fixing duplicate rows in a table

I have a table like below
DECLARE #ProductTotals TABLE
(
id int,
value nvarchar(50)
)
which has following value
1, 'abc'
2, 'abc'
1, 'abc'
3, 'abc'
I want to update this table so that it has the following values
1, 'abc'
2, 'abc_1'
1, 'abc'
3, 'abc_2'
Could someone help me out with this
Use a cursor to move over the table and try to insert every row in a second temporary table. If you get a collision (technically with a select), you can run a second query to get the maximum number (if any) that's appended to your item.
Once you know what maximum number is used (use isnull to cover the case of the first duplicate) just run an update over your original table and keep going with your scan.
Are you looking to remove duplicates? or just change the values so they aren't duplicate?
to change the values use
update producttotals
set value = 'abc_1'
where id =2;
update producttotals
set value = 'abc_2'
where id =3;
to find duplicate rows do a
select id, value
from producttotals
group by id, value
having count() > 2;
Assuming SQL Server 2005 or greater
DECLARE #ProductTotals TABLE
(
id int,
value nvarchar(50)
)
INSERT INTO #ProductTotals
VALUES (1, 'abc'),
(2, 'abc'),
(1, 'abc'),
(3, 'abc')
;WITH CTE as
(SELECT
ROW_NUMBER() OVER (Partition by value order by id) rn,
id,
value
FROM
#ProductTotals),
new_values as (
SELECT
pt.id,
pt.value,
pt.value + '_' + CAST( ROW_NUMBER() OVER (partition by pt.value order by pt.id) as varchar) new_value
FROM
#ProductTotals pt
INNER JOIN CTE
ON pt.id = CTE.id
and pt.value = CTE.value
WHERE
pt.id NOT IN (SELECT id FROM CTE WHERE rn = 1)) --remove any with the lowest ID for the value
UPDATE
#ProductTotals
SET
pt.value = nv.new_value
FROM
#ProductTotals pt
inner join new_values nv
ON pt.id = nv.id and pt.value = nv.value
SELECT * FROM #ProductTotals
Will produce the following
id value
----------- --------------------------------------------------
1 abc
2 abc_1
1 abc
3 abc_2
Explanation of the SQL
The first CTE creates a row number Value. So the numbering gets restarted whenever it sees a new value
rn id value
-------------------- ----------- --------
1 1 abc
2 1 abc
3 2 abc
4 3 abc
The second CTE called new_values ignores any IDs that are assoicated with with a RN of 1. So rn 1 and rn 2 get removed because they share the same ID. It also uses ROW_NUMBER() again to determine the number for the new_value
id value new_value
----------- ------ -------------
2 abc abc_1
3 abc abc_2
The final statement just updates the Old value with the new value