SQL create increment field based on the values of another field - sql

I need to generate an increment field based on the difference bettwen current and previous value from another field:
So for example, this table would look like this:
I have this data in postgresql and my query is currently generating the table in first image, but I need it to create the second one.
Would be thankful for any hints.

I would recommend using lag():
select t.*,
(totalreply -
lag(totalreply, 1, totalreply) over (order by month)
) as incremental_totalreply
from t;
Note that this uses the 3-argument form of lag() so the first value is 0 rather than NULL.

You can use WINDOW FUNCTION, try this:
select month, totalread, (totalread -
lead(totalread, -1, totalread) over(order by totalread))
from table1;
Reading doc, lead:
returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null

Related

How to write the SQL statement (Window function) in Teradata to get a derived column?

The source data is in a table (Teradata), please reference the attachment;the non-yellow part is original columns while yellow part are derived columns.
I want to use SQL statement of Teradata to get a derived column (the column of "final_result"):
Now the data of this table is order by operator, activity_finish_date
The column of 'induce_duration1' comes from: the current row of "activity_finish_date" minus the previous row of "activity_finish_date"
The column of 'induce_duration2' comes from: the current row of "activity_finish_date" minus the current row of "activity_start_date"
The column of "final_result" comes from: min(induce_duration1, induce_duration2)
Assuming your Teradata release supports LEAST/GREATEST on Timestamps:
activity_finish_date - -- current finish
GREATEST(activity_start_date -- current start
,LAG(activity_finish_date) -- previous end
OVER (PARTITION BY operator
ORDER BY activity_finish_date)) HOUR(4) TO SECOND

get Last value of a column in sql server

I want to get the last value of a column(it is not an identity column) and increment it to the value of corresponding row number generated.
Select isnull(LAST_VALUE(ColumnA) over(order by ColumnA), 0) +
ROW_NUMBER() OVER (ORDER BY ColumnA)
from myTable
I am calling my sp recursively hence why I thought of this logic.
But it is not working.
I basically wanted, for first time 1-9 for second run (if recursively being called 2 times) 10-19 and so on.
Total stab in the dark, but I suspect "not working" means "returning the current row's value." Don't forget that an OVER clause defaults to the window RANGE BETWEEN PRECEDING AND CURRENT ROW when one isn't explicitly specified and there is an ORDER BY (see SELECT - OVER Clause (Transact-SQL) - ORDER BY).
ORDER BY
ORDER BY *order_by_expression* [COLLATE *collation_name*] [ASC|DESC]
Defines the logical order of the rows within each partition of the result set. That is, it specifies the logical order in which the window function calculation is performed.
If it is not specified, the default order is ASC and window function will use all rows in partition.
If it is specified, and a ROWS/RANGE is not specified, then default RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used as default for window frame by the functions that can accept optional ROWS/RANGE specification (for example min or max).
As you haven't defined the window, that's what your LAST_VALUE function is using. Define that you want the whole lot for the partition:
SELECT ISNULL(LAST_VALUE(ColumnA) OVER (ORDER BY ColumnA ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), 0) +
ROW_NUMBER() OVER (ORDER BY ColumnA)
FROM dbo.myTable;
Though what Gordon says in their comment is the real solution:
You should be using an identity column or sequence.
This type of solution can (and will) end up suffering race conditions, as well as end up reusing "identities" when it shouldn't be.

How to add a numerical value into a window frame in SQLite?

I am having some difficulty in adding a numerical digit into my windows frame specification in SQLite. I am using R in SQLITE. Although if you know how to do this in SQL then that's also helpful.
Here is a link to the sqlite window function documentation - although it's abit hard to understand as to where i should place my numerical value.
https://www.sqlite.org/windowfunctions.html
In particular i am looking at the frame boundary section.
I kepe receiving the error message:
Error: unsupported frame specification
Any ideas?
My code is the following:
"create temp table forward_looking as
SELECT *,
COUNT( CASE channel WHEN 'called_office' THEN 1 ELSE null END)
OVER (PARTITION by special_digs
ORDER BY time
RANGE FOLLOWING 604800)
AS new_count
from my_data
")
Basically the code should look at the time column which is in unix epoch time and then find 7 days in advance (which is 604800 in unix time) then add a count to new_count. And do this on a row by row term.
I think I may have the numeric in the RANGE FOLLOWING part the wrong way around??
I think that you want:
create temp table forward_looking as
select
d.*,
count(*) filter(where channel <> 'called_office') over (
partition by special_digs
order by time
range between current row and 604800 following
) as new_count
from my_data d
That is, the range clause requires a starting and ending specification (between ... and ...).
Note that I also modified the window function to use the standard filter clause, which makes the logic more obvious.

Cumulative count for calculating daily frequency using SQL query (in Amazon Redshift)

I have a dataset contains 'UI' (unique id), time, frequency (frequency for give value in UI column), as it is shown here:
What I would like to add a new column named 'daily_frequency' which simply counts each unique value in UI column for a given day sequentially as I show in the image below.
For example if UI=114737 and it is repeated 2 times in one day, we should have 1, and 2 in the daily_frequency column.
I could do that with Python and Panda package using group by and cumcount methods as follows ...
df['daily_frequency'] = df.groupby(['UI','day']).cumcount()+1
However, for some reason, I must do this via SQL queries (Amazon Redshift).
I think you want a running count, which could be calculated as:
COUNT(*) OVER (PARTITION BY ui, TRUNC(time) ORDER BY time
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS daily_frequency
Although Salman's answer seems to be correct, I think ROW_NUMBER() is simpler:
COUNT(*) OVER (PARTITION BY ui, time::date
ORDER BY time
) AS daily_frequency

How to read 2 or more records in Hive UDF?

I have a table of some toll stations logs. My task, "translated" into SQL is:
step 1. sort these records, using GROUP BY station, lane.
step 2. arrange these records, using ORDER BY check_time.
step 3.[that is the problem] consecutively judge every two contiguous records in each group, whether the interval is less then 5 seconds or not.
It is easy if I can do it in C, Java or others but not in SQL.
It seems that Hive UDF(User Defined Function) can help me do that. I have read the Demo UDF from official documentaion. But still I don't know how to pass the consecutive 2 records into my function. Any advice?
You can do it using SQL.
Using LAG() analytic function you can get previous row check_time and other columns if necessary. Then do a calculation with two timestamps. Convert timestamps to seconds using unix_timestamp() and subtract:
select t.*,
case when time_diff < 5 then ... else ... end --do some logic
from
(
select t.*,
--current time minus previous time
unix_timestamp(check_time) -
unix_timestamp(lag(check_time) over (partition by station, lane order by check_time)) as time_diff
from table t
) t
The Lead() analytic function to get next row's check_time or other column if necessary.