Teradata - Counting previous values - sql

I am trying to sum previous values in a query as an intermediate step to accomplish another task. I want to sum previous values of 3,
for example
Type value
A 3
A 3
A 3
A 3
A 3
A 3
A 3
B 2.3
B 2.3
B 3
B 2.3
B 2.3
B 3
B 2.3
and my ideal answers would be
Type value Previous 3's
A 3 0
A 3 1
A 3 2
A 3 3
A 3 4
A 3 5
A 3 6
B 2.3 7
B 2.3 7
B 3 7
B 2.3 8
B 2.3 8
B 3 8
B 2.3 9
How would I achieve this in Teradata or SQL?

SQL tables represent unordered sets. To count previous values, you need a column that specifies the ordering, and you don't have one in your question.
You can use a cumulative count or sum:
select t.*,
count(case when value = 3 then 1) over
(order by ? rows between unbounded preceding and 1 preceding)
from t;
The ? is for the column specifying the ordering.

You can achieve this at least in MYSQL:
create table teadata(Type varchar(1), value number);
insert into teadata(Type, value) values('A', 3);
insert into teadata(Type, value) values('A', 3);
insert into teadata(Type, value) values('A', 3);
insert into teadata(Type, value) values('B', 3);
insert into teadata(Type, value) values('B', 2.3);
select type, value, (#sum := #sum + (case when value = 3 then 1 else 0 end)) as cumesum
from teadata t cross join (select #sum := 0) params;
This will print:
+------+-------+---------+
| type | value | cumesum |
+------+-------+---------+
| A | 3 | 1 |
| A | 3 | 2 |
| A | 3 | 3 |
| B | 3 | 4 |
| B | 2.3 | 4 |
+------+-------+---------+
The trick in this case is to use a variable #sum together with a case statement. This works on MySQL. Not sure about Teradata.

Related

SQL - Delete duplicate columns error [duplicate]

This question already has answers here:
How to delete duplicate rows in SQL Server?
(26 answers)
Closed 4 years ago.
I have the following table (TBL_VIDEO) with duplicate column entries in "TIMESTAMP", and I want to remove them only if the "CAMERA" number matches.
BEFORE:
ANALYSIS_ID | TIMESTAMP | EMOTION | CAMERA
-------------------------------------------
1 | 5 | HAPPY | 1
2 | 10 | SAD | 1
3 | 10 | SAD | 1
4 | 5 | HAPPY | 2
5 | 15 | ANGRY | 2
6 | 15 | HAPPY | 2
AFTER:
ANALYSIS_ID | TIMESTAMP | EMOTION | CAMERA
-------------------------------------------
1 | 5 | HAPPY | 1
2 | 10 | SAD | 1
4 | 5 | HAPPY | 2
5 | 15 | ANGRY | 2
I have attempted this statement but the columns wouldn't delete accordingly. I appreciate all the help to produce a correct SQL statement. Thanks in advance!
delete y
from TBL_VIDEO y
where exists (select 1 from TBL_VIDEO y2 where y.TIMESTAMP = y2.TIMESTAMP and y2.CAMERA < y.CAMERA);
CREATE TABLE Table12
([ANALYSIS_ID] int, [TIMESTAMP] int, [EMOTION] varchar(5))
;
INSERT INTO Table12
([ANALYSIS_ID], [TIMESTAMP], [EMOTION])
VALUES
(1, 5, 'HAPPY'),
(2, 10, 'SAD'),
(3, 10, 'SAD'),
(4, 15, 'HAPPY'),
(5, 15, 'ANGRY')
;
with cte as (select *, row_number() over (partition by emotion order by [ANALYSIS_ID] ) as rn from Table12)
delete from cte
where rn>1
select * from Table12
output
ANALYSIS_ID TIMESTAMP EMOTION
1 5 HAPPY
2 10 SAD
5 15 ANGRY
You have two questions:
what is wrong with my code
is there a better way to delete the duplicate column entries
For the second question, it's a dup.
For the first question, please refer https://learn.microsoft.com/en-us/sql/t-sql/statements/delete-transact-sql?view=sql-server-2017. (Press F1 on delete). Correct syntax is
delete y
from Table12 y
where exists (
Generic SQL command as below. you can put you column name/ condition and table name.
DELETE T from
(
SELECT ROW_NUMBER()over(partition by column1 order by column2)a,* FROM TABLENAME
)T
where a>1
delete
from TBL_VIDEO y
where y.CAMERA < (select y2.CAMERA
from TBL_VIDEO y2 where
y.TIMESTAMP = y2.TIMESTAMP );

get the nth-lowest value in a `group by` clause

Here's a tough one: I have data coming back in a temporary table foo in this form:
id n v
-- - -
1 3 1
1 3 10
1 3 100
1 3 201
1 3 300
2 1 13
2 1 21
2 1 300
4 2 1
4 2 7
4 2 19
4 2 21
4 2 300
8 1 11
Grouping by id, I need to get the row with the nth-lowest value for v based on the value in n. For example, for the group with an ID of 1, I need to get the row which has v equal to 100, since 100 is the third-lowest value for v.
Here's what the final results need to look like:
id n v
-- - -
1 3 100
2 1 13
4 2 7
8 1 11
Some notes about the data:
the number of rows for each ID may vary
n will always be the same for every row with a given ID
n for a given ID will never be greater than the number of rows with that ID
the data will already be sorted by id, then v
Bonus points if you can do it in generic SQL instead of oracle-specific stuff, but that's not a requirement (I suspect that rownum may factor prominently in any solutions). It has in my attempts, but I wind up confusing myself before I get a working solution.
I would use row_number function make row number the compare with n column value in CTE, do another CTE to make row number order by v desc.
get rn = 1 which is mean max value in the n number group.
CREATE TABLE foo(
id int,
n int,
v int
);
insert into foo values (1,3,1);
insert into foo values (1,3,10);
insert into foo values (1,3,100);
insert into foo values (1,3,201);
insert into foo values (1,3,300);
insert into foo values (2,1,13);
insert into foo values (2,1,21);
insert into foo values (2,1,300);
insert into foo values (4,2,1);
insert into foo values (4,2,7);
insert into foo values (4,2,19);
insert into foo values (4,2,21);
insert into foo values (4,2,300);
insert into foo values (8,1,11);
Query 1:
with cte as(
select id,n,v
from
(
select t.*, row_number() over(partition by id ,n order by n) as rn
from foo t
) t1
where rn <= n
), maxcte as (
select id,n,v, row_number() over(partition by id ,n order by v desc) rn
from cte
)
select id,n,v
from maxcte
where rn = 1
Results:
| ID | N | V |
|----|---|-----|
| 1 | 3 | 100 |
| 2 | 1 | 13 |
| 4 | 2 | 7 |
| 8 | 1 | 11 |
use window function
select * from
(
select t.*, row_number() over(partition by id ,n order by v) as rn
from foo t
) t1
where t1.rn=t1.n
as ops sample output just need 3rd highest value so i put where condition t1.rn=3 though accodring to description it would be t1.rn=t1.n
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=65abf8d4101d2d1802c1a05ed82c9064
If your database is version 12.1 or higher then there is a much simpler solution:
SELECT DISTINCT ID, n, NTH_VALUE(v,n) OVER (PARTITION BY ID) AS v
FROM foo
ORDER BY ID;
| ID | N | V |
|----|---|-----|
| 1 | 3 | 100 |
| 2 | 1 | 13 |
| 4 | 2 | 7 |
| 8 | 1 | 11 |
Depending on your real data you may have to add an ORDER BY n clause and/or windowing_clause as RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, see NTH_VALUE

Juggling the values of a column in oracle

In a table tab I have a column with the name of col1 and it has 5 rows with values 1 to 5.
col1
1
2
3
4
5
Now I want to write a select query which will juggle the values in col1,distribute it and put those values in new column.
Below output will help you understand my requirement.
col1 New_col
1 3
2 5
3 4
4 1
5 2
Note: If 1 is changed to 3, then no other value in col1 after juggling should result in 3. i have to do it for 500 rows, i am taking a small example for better understanding.
Please let me know if you require further clarification.
This is a step by step approach:
Try it at SQL Fiddle
Oracle 11g R2 Schema Setup:
create table t ( i int );
insert into t values (1);
insert into t values (2);
insert into t values (3);
insert into t values (4);
insert into t values (5);
Step by step query:
with
/*add a random column to shuffle*/
a as
( select i, dbms_random.value as o
from t),
/*get last element to pair it with the first*/
b as
( select i,
o,
last_Value(i) over (ORDER BY o asc
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS i2
from a)
/*pair each element with the next one, take the last one as default*/
select i, LAG(i, 1, i2 ) OVER (ORDER BY o ) AS i3
from b
Results:
| I | I3 |
|---|----|
| 2 | 5 |
| 1 | 2 |
| 3 | 1 |
| 4 | 3 |
| 5 | 4 |
What about this?
SELECT row_number() over (order by 1) col, col1 new_col
FROM tab
ORDER BY DBMS_RANDOM.VALUE
demo

Stored procedure: Select by matching input parameter or return the null records

I have the following case:
There is a table like this:
Id | Param | Value
------ | ------ | -------------
1 | 1 | One 1
1 | NULL | Null-Value 1
1 | 2 | Two 1
1 | 3 | Three 3
2 | NULL | Nul-Value 2
2 | 2 | Two 2
3 | NULL | Null-Value 3
4 | 1 | One 4
5 | NULL | Null-Vaue 5
6 | NULL | Null-Value 6
I have to write a stored procedure as by given input nullable parameter for "Param" I have to generate a result which will contain a table with ID and Value and the result is based on the logic -
If the input parameter is null - return all rows with null values for Param
If the parameter is not null, then return the rows which match that parameter in the Param column and also all rows (for the other IDs), which have null value as Param.
There MUST be only ONE result per Id.
Consider that there is applied an unique composite index on the Id and Param columns.
Example:
Input parameter: 1
Output table:
Id | Value
-- | -------------
1 | One 1
2 | Nul-Value 2
3 | Null-Value 3
4 | One 4
5 | Null-Vaue 5
6 | Null-Value 6
Example 2 (let me include the Param column as well for better visibility):
Input parameter: 2
Output table:
Id | Param | Value
-- | ----- |-------------
1 | 2 |Two 1
2 | 2 |Two 2
3 | NULL |Null-Value 3
5 | NULL |Null-Value 5
6 | NULL |Null-Value 6
I suppose it will be a join of the table with itself, or even better with cross(or maybe outer) apply and some proper where clause...
This solves the problem with multiple results per id. You don't give any rules for which one you want per id so I can't do the next step.
SELECT *
FROM tableyoudidnotsaythenameof as x
WHERE (coalesce(x.value, #inparam) = #inparam) or
(#inparam is null and x.value is null)
if you don't care which one you return when there is more than one per id this will work:
SELECT *
FROM (
SELECT *, row_number() over (partition by id) as rn
FROM tableyoudidnotsaythenameof as x
WHERE (coalesce(x.value, #inparam) = #inparam) or
(#inparam is null and x.value is null)
) zed
WHERE zed.rn = 1
to sort nulls with bigger rn:
SELECT *, row_number() over (partition by id order by CASE WHEN x.value is null then 2 else 1 END) as rn
You could use this one without the use of joins. When #param is null it returns all rows.
DECLARE #t TABLE (id INT, [Param] INT, Val VARCHAR(50))
INSERT INTO #t VALUES
(1, 1,' One 1'),
(1, NULL,'Null-Value 1'),
(1, 2,'Two 1'),
(1, 3,'Three 3'),
(2,NULL,'Nul-Value 2'),
(2, 2,'One 2'),
(3, NULL,'Null-Value 3'),
(4, 1,'One 4'),
(5, NULL,'Null-Vaue 5'),
(6, NULL,'Null-Value 6')
DECLARE #Param INT
SET #Param = 1
SELECT *
FROM #t
WHERE #Param IS NULL OR (([Param] = #Param) OR ([Param] IS NULL AND Id <> #Param))
I figured it out. Here is the query:
declare #input int = 2
select t.Id, t.Value
from [dbo].[MyTable] t
outer apply (select * from MyTable t2
where t2.Param = #input
and t.Id = t2.Id) mt
where (t.Id = mt.Id and t.Param = #input)
or (t.Param is null and mt.Param is null)

Recursive SQL statement (Postgresql) - simplified version

This is simplified question for more complicated one posted here:
Recursive SQL statement (PostgreSQL 9.1.4)
Simplified question
Given you have upper triangular matrix stored in 3 columns (RowIndex, ColumnIndex, MatrixValue):
ColumnIndex
1 2 3 4 5
1 2 2 3 3 4
2 4 4 5 6 X
3 3 2 2 X X
4 2 1 X X X
5 1 X X X X
X values are to be calculated using the following algorithm:
M[i,j] = (M[i-1,j]+M[i,j-1])/2
(i= rows, j = columns, M=matrix)
Example:
M[3,4] = (M[2,4]+M[3,3])/2
M[3,5] = (m[2,5]+M[3,4])/2
The full required result is:
ColumnIndex
1 2 3 4 5
1 2 2 3 3 4
2 4 4 5 6 5
3 3 2 2 4 4.5
4 2 1 1.5 2.75 3.625
5 1 1 1.25 2.00 2.8125
Sample data:
create table matrix_data (
RowIndex integer,
ColumnIndex integer,
MatrixValue numeric);
insert into matrix_data values (1,1,2);
insert into matrix_data values (1,2,2);
insert into matrix_data values (1,3,3);
insert into matrix_data values (1,4,3);
insert into matrix_data values (1,5,4);
insert into matrix_data values (2,1,4);
insert into matrix_data values (2,2,4);
insert into matrix_data values (2,3,5);
insert into matrix_data values (2,4,6);
insert into matrix_data values (3,1,3);
insert into matrix_data values (3,2,2);
insert into matrix_data values (3,3,2);
insert into matrix_data values (4,1,2);
insert into matrix_data values (4,2,1);
insert into matrix_data values (5,1,1);
Can this be done?
Test setup:
CREATE TEMP TABLE matrix (
rowindex integer,
columnindex integer,
matrixvalue numeric);
INSERT INTO matrix VALUES
(1,1,2),(1,2,2),(1,3,3),(1,4,3),(1,5,4)
,(2,1,4),(2,2,4),(2,3,5),(2,4,6)
,(3,1,3),(3,2,2),(3,3,2)
,(4,1,2),(4,2,1)
,(5,1,1);
Run INSERTs in a LOOP with DO:
DO $$
BEGIN
FOR i IN 2 .. 5 LOOP
FOR j IN 7-i .. 5 LOOP
INSERT INTO matrix
VALUES (i,j, (
SELECT sum(matrixvalue)/2
FROM matrix
WHERE (rowindex, columnindex) IN ((i-1, j),(i, j-1))
));
END LOOP;
END LOOP;
END;
$$
See result:
SELECT * FROM matrix order BY 1,2;
This can be done in a single SQL select statement, but only because recursion is not necessary. I'll outline the solution. If you actually want the SQL code, let me know.
First, notice that the only items that contribute to the sums are along the diagonal. Now, if we follow the contribution of the value "4" in (1, 5), it contributes 4/2 to (2,5) and 4/4 to (3,5) and 4/8 to (4,5). Each time, the contribution is cut in half, because (a+b)/2 is (a/2 + b/2).
When we extend this, we start to see a pattern similar to Pascal's triangle. In fact, for any given point in the lower triangular matrix (below where you have values), you can find the diagonal elements that contribute to the value. Extend a vertical line up to hit the diagonal and a horizontal line to hit the diagonal. Those are the contributors from the diagonal row.
How much do they contribute? Well, for that we can go to Pascal's triangle. For the first diagonal below where we have values, the contributions are (1,1)/2. For the second diagonal, (1,2,1)/4. For the third, (1,3,3,1)/8 . . . and so on.
Fortunately, we can calculate the contributions for each value using a formula (the "choose" function from combinatorics). The power of 2 is easy. And, determining how far a given cell is from the diagonal is not too hard.
All of this can be combined into a single Postgres SQL statement. However, #Erwin's solution also works. I only want to put the effort into debugging the statement if his solution doesn't meet your needs.
... and here comes the recursive CTE with multiple embedded CTE's (tm):
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE matrix_data (
yyy integer,
xxx integer,
val numeric);
insert into matrix_data (yyy,xxx,val) values
(1,1,2) , (1,2,2) , (1,3,3) , (1,4,3) , (1,5,4)
, (2,1,4) , (2,2,4) , (2,3,5) , (2,4,6)
, (3,1,3) , (3,2,2) , (3,3,2)
, (4,1,2) , (4,2,1)
, (5,1,1)
;
WITH RECURSIVE rr AS (
WITH xx AS (
SELECT MIN(xxx) AS x0
, MAX(xxx) AS x1
FROM matrix_data
)
, mimax AS (
SELECT generate_series(xx.x0,xx.x1) AS xxx
FROM xx
)
, yy AS (
SELECT MIN(yyy) AS y0
, MAX(yyy) AS y1
FROM matrix_data
)
, mimay AS (
SELECT generate_series(yy.y0,yy.y1) AS yyy
FROM yy
)
, cart AS (
SELECT * FROM mimax mm
JOIN mimay my ON (1=1)
)
, empty AS (
SELECT * FROM cart ca
WHERE NOT EXISTS (
SELECT *
FROM matrix_data nx
WHERE nx.xxx = ca.xxx
AND nx.yyy = ca.yyy
)
)
, hot AS (
SELECT * FROM empty emp
WHERE EXISTS (
SELECT *
FROM matrix_data ex
WHERE ex.xxx = emp.xxx -1
AND ex.yyy = emp.yyy
)
AND EXISTS (
SELECT *
FROM matrix_data ex
WHERE ex.xxx = emp.xxx
AND ex.yyy = emp.yyy -1
)
)
-- UPDATE from here:
SELECT h.xxx,h.yyy, md.val / 2 AS val
FROM hot h
JOIN matrix_data md ON
(md.yyy = h.yyy AND md.xxx = h.xxx-1)
OR (md.yyy = h.yyy-1 AND md.xxx = h.xxx)
UNION ALL
SELECT e.xxx,e.yyy, r.val / 2 AS val
FROM empty e
JOIN rr r ON ( e.xxx = r.xxx+1 AND e.yyy = r.yyy)
OR ( e.xxx = r.xxx AND e.yyy = r.yyy+1 )
)
INSERT INTO matrix_data(yyy,xxx,val)
SELECT DISTINCT yyy,xxx
,SUM(val)
FROM rr
GROUP BY yyy,xxx
;
SELECT * FROM matrix_data
;
New result:
NOTICE: drop cascades to table tmp.matrix_data
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 15
INSERT 0 10
yyy | xxx | val
-----+-----+------------------------
1 | 1 | 2
1 | 2 | 2
1 | 3 | 3
1 | 4 | 3
1 | 5 | 4
2 | 1 | 4
2 | 2 | 4
2 | 3 | 5
2 | 4 | 6
3 | 1 | 3
3 | 2 | 2
3 | 3 | 2
4 | 1 | 2
4 | 2 | 1
5 | 1 | 1
2 | 5 | 5.0000000000000000
5 | 5 | 2.81250000000000000000
4 | 3 | 1.50000000000000000000
3 | 5 | 4.50000000000000000000
5 | 2 | 1.00000000000000000000
3 | 4 | 4.00000000000000000000
5 | 3 | 1.25000000000000000000
4 | 5 | 3.62500000000000000000
4 | 4 | 2.75000000000000000000
5 | 4 | 2.00000000000000000000
(25 rows)
while (select max(ColumnIndex+RowIndex) from matrix_data)<10
begin
insert matrix_data
select c1.RowIndex, c1.ColumnIndex+1, (c1.MatrixValue+c2.MatrixValue)/2
from matrix_data c1
inner join
matrix_data c2
on c1.ColumnIndex+1=c2.ColumnIndex and c1.RowIndex-1 = c2.RowIndex
where c1.RowIndex+c1.ColumnIndex=(select max(RowIndex+ColumnIndex) from matrix_data)
and c1.ColumnIndex<5
end