SQL Equivalent to LAG() to create a computed table - sql

I'm fairly new to SQL and I'm trying to create a computed column in a table that calculates the DateDiff on a column between the current row and the previous row.
Now all is fine and dandy doing a query with Select to display this value:
SELECT *,
Case When INCM<> lag(INCM) over(ORDER BY INCM ASC, Submit_Date ASC) Then 0 else DateDiff(mi,Submit_Date, lag(Submit_Date) over (ORDER BY INCM ASC, Submit_Date ASC)) End As Diff
FROM [OP].[Ticket_Work_Info]
But as I recently found out, when adding a computed column with this logic to the existing table I get an error saying Windowed Functions can only be used with Select or Order By.
I've been searching everywhere for an equivalent to LAG to use to create a computed column on a table.
In the meantime I ended up creating a view with this piece of code, but that's not really what I want to do going forward.
Can someone give me a hand?
Regards,

Here is some reasoning on why computed columns can only refer to values in the current row (and deterministic functions and constants). Consider a definition such as:
create t (
t_id int,
a varchar(255),
x int,
prev_x as (lag(x) over (order by t_id)
);
And some sample data:
id y x
1 z 6
2 abc 28
3 z 496
This looks fine. But, consider this query:
select t.*
from t
where a <> 'abc';
Should the value of x_prev for the third row be 28 or 6?
I guess no one wanted to make a decision on this. Instead, the idea is that a row is well-defined, so the filtering conditions do not affect the values within a row.

Related

Redshift Query for comparing current row to previous row - SQL query to Redshift Query

I have a table test with fields - A (ID), B (Flag). I need to add a new column - C (Result) in this table and it's value will be derived based on B (Flag) field. If flag is false then keep checking previous rows till we get flag as true and then take value of A (ID) field and populate it in C (Result) column. So C will have the last value of A with B field as True.
I have the query in SQL but when I try to use it in Redshift I get following errors.
1st Query Option:
WITH
cte1 AS (
SELECT A, SUM(B='T') OVER (ORDER BY A) group_no
FROM test
),
cte2 AS (
SELECT A, MIN(A) OVER (PARTITION BY group_no) previous_T
FROM cte1
)
UPDATE test
JOIN cte2 USING (A)
SET test.C = cte2.previous_T;
I am getting errors in SUM and MIN function.
2nd Query Option:
UPDATE test
JOIN (
SELECT A,
#tmp := CASE WHEN B='T' THEN A ELSE #tmp END C
FROM test
JOIN (SELECT #tmp:=0) init
ORDER BY A
) data USING (A)
SET test.C = data.C;
Getting error in temporary table.
I am new to SQL with no experience in Redshift, appreciate any help I get. Thanks!
I only have a few min but I think I can get you started. Let’s stick to query #1.
SUM(B=‘T’) isn’t going to work in Redshift. Look at the function DECODE() as it will allow you to switch how a column is generated based on another column.
It looks like you want to do a rolling SUM() so you will need a frame clause (likely “rows unbounded preceding”).
It’s not clear why you want SUM() of ids. I’d think you would want MAX() as this will give you the highest preceding id. Advise if I’m missing something.
I’d think you would want something like:
DECODE(B=‘T’, true, A, MAX(A) OVER (order by A rows unbounded preceding) as prev_id
But I don’t know your data or the process you are trying to implement.
As for the UPDATE I suggest you look at the Redshift docs. This will look more like:
…
update test set c = cte.prev_id
from cte
where test.id = cte.id;
This assumes the id is unique and that I have half a clue what you are trying to do.

Inserting values into new column based on a LEAD () function

I have a column called Sales and I created a column sales_next. I want to base each row's value in sales_next using a LEAD function. I got this so far, but I for some reason can't figure out how to update every row.
INSERT INTO superstore_sales$ (sales_next)
VALUES
(
(SELECT TOP 1
LEAD(sales, 1) OVER (
ORDER BY sales
) AS sales_next
FROM superstore_sales$
WHERE sales_next IS NULL
))
I have tried removing the TOP 1 and I get the multiple values in subquery error, of course because I am unsure how to tell SQL to grab one value for each row using the LEAD function.
A derived table (or a CTE) is updatable, if I understand what you're trying to do this should work:
update t set sales_next = sn
from (
select sales_next, Lead(sales, 1) OVER (ORDER BY sales) AS sn
from superstore_sales$
where sales_next IS null
)t;
I figured it out.Aaron was right, no need to create the column physically. I only need to do it query time. I figured out how both LEAD and LAG operate and the final code was simple:
SELECT "Order ID", "Customer Name", Sales,
LEAD(Sales) OVER(ORDER BY Sales) as next_sales
FROM superstore_sales$;
Thank you for your help:)

Incorporate a concatenation and count in a SQL update command?

I am looking for a way to update records so each entry adds 1 to the end of the string. In my case, I'm trying to update a field named FiberID. Each Record should have JCK0.R000.Ax, where x is equal to 1,2,3...,24.
Ideal result:
FiberID
JCK0.R000.A1
JCK0.R000.A2
JCK0.R000.A3
... and so on until it reaches A24.
Here is an example of the data.
This seems so useful that I'm sure it has been discussed here before, but for what ever reason I'm not seeing anything.
You could use an row_number and an updatable CTE:
with cte as (
select
fiber_id,
concat(
fiber_id,
'.A',
cast(row_number() over (partition by fiber_id order by id) as varchar(2))
) new_fiber_id
from mytable
)
update cte set fiber_id = new_fiber_id
This assumes that you have a column called id that can be used to order records having the same fiber_id.
Side note: it is unclear why you should have exactly 24 numbers per fiber_id, and you sample data does not describes that. This will assign increasing numbers to duplicate fiber_ids, regardless of how many there are.

Select rows by index in Amazon Athena

This is a very simple question but I can't seem to find documentation on it. How one would query rows by index (ie select the 10th through 20th row in a table)?
I know there's a row_numbers function but it doesn't seem to do what I want.
Do not specify any partition so your row number will be an integer between 1 and your number of record.
SELECT row_num FROM (
SELECT row_number() over () as row_num
FROM your_table
)
WHERE row_num between 100000 and 100010
I seem to have found a roundabout and clunky way of doing this in Athena, so any better answers are welcome. This approach requires you have some numeric column in your table already, in this case named some_numeric_column:
SELECT some_numeric_column, row_num FROM (
SELECT some_numeric_column,
row_number() over (order by some_numeric_column) as row_num
FROM your_table
)
WHERE row_num between 100000 and 100010
To explain, you first select some numeric column in your data, then create a column (called row_num) of row numbers which is based on the order of your selected numeric column. Then you wrap that all in a select call because Athena doesn't support creating and then conditioning on the row_num column within a single call. If you don't wrap it in a second SELECT call Athena will spit out some errors about not finding a column named row_num.

Joining Two Same-Sized Resultsets by Row Number

I have two table functions that return a single column each. One function is guaranteed to return the same number of rows as the other.
I want to insert the values into a new two-column table. One colum will receive the value from the first udf, the second column from the second udf. The order of the inserts will be the order in which the rows are returned by the udfs.
How can I JOIN these two udfs given that they do not share a common key? I've tried using a ROW_NUMBER() but can't quite figure it out:
INSERT INTO dbo.NewTwoColumnTable (Column1, Column2)
SELECT udf1.[value], udf2.[value]
FROM dbo.udf1() udf1
INNER JOIN dbo.udf2() udf2 ON ??? = ???
This will not help you, but SQL does not guarantee row order unless it is asked to explicitly, so the idea that they will be returned in the order you expect may be true for a given set, but as I understand the idea of set based results, is fundamentally not guaranteed to work properly. You probably want to have a key returned from the UDF if it is associated with something that guarantees the order.
Despite this, you can do the following:
declare #val int
set #val=1;
Select Val1,Val2 from
(select Value as Val2, ROW_NUMBER() over (order by #val) r from udf1) a
join
(select Value as Val2, ROW_NUMBER() over (order by #val) r from udf2) b
on a.r=b.r
The variable addresses the issue of needing a column to sort by.
If you have the privlidges to edit the UDF, I think the better practice is to already sort the data coming out of the UDF, and then you can add ident int identity(1,1) to your output table in the udf, which makes this clear.
The reaosn this might matter is if your server decided to split the udf results into two packets. If the two arrive out of the order you expected, SQL could return them in the order received, which ruins the assumption made that he UDF will return rows in order. This may not be an issue, but if the result is needed later for a real system, proper programming here prevents unexpected bugs later.
In SQL, the "order returned by the udfs" is not guaranteed to persist (even between calls).
Try this:
WITH q1 AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY whatever1) rn
FROM udf1()
),
q2 AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY whatever2) rn
FROM udf2()
)
INSERT
INTO dbo.NewTwoColumnTable (Column1, Column2)
SELECT q1.value, q2.value
FROM q1
JOIN q2
ON q2.rn = q1.rn
PostgreSQL 9.4+ could append a INT8 column at the end of the udfs result using the WITH ORDINALITY suffix
-- set returning function WITH ORDINALITY
SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
ls | n
-----------------+----
pg_serial | 1
pg_twophase | 2
postmaster.opts | 3
pg_notify | 4
official doc: http://www.postgresql.org/docs/devel/static/functions-srf.html
related blogspot: http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-with-ordinality/