I'm working on a forecast model and am stuck automating the trend over time in sql. What I'm trying to do is multiply each row by a previously derived number and then multiply the next row by the calculated previous row. Here is a basic visualization;
date num_reqs cumulative_value cumulative value formula
2019-10-01 246.4 276 num_reqs * 1.12
2019-10-02 246.4 309 previous cum_value * 1.12
2019-10-03 246.4 346 previous cum_value * 1.12
2019-10-04 246.4 388 previous cum_value * 1.12
2019-10-05 246.4 435 previous cum_value * 1.12
I've tried a few variations of lag() but I don't think lag allows for cumulation. I've also tried exp() but it doesn't work with my values.
You can use exponentiation to do this. I think you want:
select t.*,
numreqs * power(1.12, row_number() over (order by date)) as cumulative_value
from t
Related
It would be awesome if there was a way to index rows during a query.
Is there a way to SELECT (compute) the difference of a single column between consecutive rows?
Let's say, something like the following query
SELECT
toStartOfDay(stamp) AS day,
count(day ) AS events ,
day[current] - day[previous] AS difference, -- how do I calculate this
day[current] / day[previous] as percent, -- and this
FROM records
GROUP BY day
ORDER BY day
I want to get the integer and percentage difference between the current row's 'events' column and the previous one for something similar to this:
day
events
difference
percent
2022-01-06 00:00:00
197
NULL
NULL
2022-01-07 00:00:00
656
459
3.32
2022-01-08 00:00:00
15
-641
0.02
2022-01-09 00:00:00
7
-8
0.46
2022-01-10 00:00:00
137
130
19.5
My version of Clickhouse doesn't support window-function but, on looking about the LAG() function mentioned in the comments, I found neighbor(), which works perfectly for what I'm trying to do
SELECT
toStartOfDay(stamp) AS day,
count(day ) AS events ,
(events - neighbor(events, -1)) as diff,
(events / neighbor(events, -1)) as perc
FROM records
GROUP BY day
ORDER BY day
I want to calculation subtraction field energy with next value energy I just try with my query but I think result is still wrong
For my code
SELECT
datapm.id,
datapm.tgl,
CONVERT ( CHAR ( 5 ), datapm.stamp , 108 ) stamp,
datapm.pmid,
datapm.vavg,
datapm.pf,
( CAST (datapm.energy AS FLOAT) - (select top 1 CAST (energy AS FLOAT) from datapm as dt2 where dt2.id > datapm.id and dt2.tgl=datapm.tgl)) as energy
FROM
datapm
GROUP BY
datapm.id,
datapm.tgl,
datapm.stamp,
datapm.pmid,
datapm.vavg,
datapm.pf,
datapm.energy
ORDER BY tgl desc
My sample
id pmdi tgl stamp vavg pf energy
787 SDPEXT_2 2021-09-06 06:00:00.0000000 407.82 0.98 1408014.25
788 SDPEXT_2 2021-09-06 07:00:00.0000000 403.31 0.85 1408041.00
789 SDPEXT_2 2021-09-06 08:00:00.0000000 408.82 0.87 1408081.75
Result I want
id pmdi tgl stamp vavg pf energy
787 SDPEXT_2 2021-09-06 06:00:00.0000000 407.82 0.98 -2.675
788 SDPEXT_2 2021-09-06 07:00:00.0000000 403.31 0.85 -4.075
789 SDPEXT_2 2021-09-06 08:00:00.0000000 408.82 0.87 -11.012
remove the GROUP BY in your query, you are not using any aggregate function
If energy is already in numeric data type, don't convert to float.
use LEAD() to get the next row value
SELECT . . .
(d.energy - LEAD (d.energy) OVER (PARTITION BY d.tgl
ORDER BY d.id)) / 10
FROM datapm d
not sure what is the actual formula, but looking at the result, you need to divide by 10 to obtain it
So my doubt is in sql. I am looking to find the total revenue of a parent account for the last 12 months.
The data will look something like this
revenue
name
month
year
10000
abc
201001
2010-01-12
10000
abc
201402
2014-02-14
2000
abc
201404
2014-04-12
3000
abc
201406
2014-06-30
30000
def
201301
2013-01-14
6000
def
201304
2013-04-12
9000
def
201407
2013-07-19
And the output should be something like this
revenue
name
month
year
Running Sum
10000
abc
201001
2010-01-12
10000
10000
abc
201402
2014-02-14
10000
2000
abc
201404
2014-04-12
12000
3000
abc
201406
2014-06-30
15000
30000
def
201301
2013-01-14
30000
6000
def
201304
2013-04-12
36000
9000
def
201407
2013-07-19
45000
I have tried using using windowing function something like this and the logic that I need
select revenue, name, date, month,
sum(revenue) over (partition by name order by month rows between '12 months' preceding AND CURRENT ROW )
from table
but the above command gives a syntax error
Redshift does not support intervals in the window frame specification.
So, convert to a number. A convenient one in this case is the number of months since some point in time:
select revenue, name, date, month,
sum(revenue) over (partition by name
order by datediff(month, '1900-01-01', month)
range between 12 preceding and current row
)
from table;
I will note that your logic adds up data from 13 months, not 12. I suspect you want between 11 preceding and current row.
You can use rows between if you have data for all months:
sum(revenue) over (partition by name
order by datediff(month, '1900-01-01', month)
rows between 12 preceding and current row
)
I have a table which has the monthly sales values for each of the items. I need last 3 months average sales value next to the current month sales for each item.
Need to perform this operation in hive.
The sample input table looks like below
Item_ID Sales Month
A 4295 Dec-2018
A 245 Nov-2018
A 1337 Oct-2018
A 3290 Sep-2018
A 2000 Aug-2018
B 856 Dec-2018
B 1694 Nov-2018
B 4286 Oct-2018
B 2780 Sep-2018
B 3100 Aug-2018
The result table should look like this
Item_ID Sales_Current_Month Month Sales_Last_3_months_average
A 4295 Dec-2018 1624
A 245 Nov-2018 2209
B 856 Dec-2018 2920
B 1694 Nov-2018 3388.67
Assuming there is no missing months data, you can use avg window function to do this.
select t.*
,avg(sales) over(partition by item_id order by month rows between 3 preceding and 1 preceding) as avg_sales_prev_3_months
from tbl t
If month column is in a format different from yyyyMM, use an appropriate conversion so the ordering works as expected.
Please help me with a SQL Server Query that can bucket data dynamically into ranges.
Here is my source data:
Value
=======
45
33.5
33.1
33
32.8
25.3
25.2
25.1
25
21.3
21.2
21.1
20.9
12.3
12.2
12.15
12.1
12
11.8
Expected output:
Value Rank
=============
45 1
(mean value in this range is 45)
33.5 2
33.1 2
33 2
32.8 2
(mean value is 33.1 - any value in the range (-10%) 29.79 to 36.41 (+10%) should be given a rank of 2)
25.3 3
25.2 3
25.1 3
25 3
21.3 4
21.2 4
21.1 4
20.9 4
12.3 5
12.2 5
12.15 5
12.1 5
12 5
11.8 5
DENSE, RANK and NTILE does not seem to give me a ranking like this. The range is dynamic and not known earlier. Any help highly appreciated.
The bucketing rule is:
Each bucket contains a data set with 10% variation from the mean value
Here's one way:
select val, dense_rank() over (order by cast(val/10 as int) desc) ntile
from yourtable
Use dense_rank but specify your buckets in the order by clause. (I'm assuming this is how it works for your sample data)
First convert the value to a number having 2 decimal places.
Then, use a CASE expression for doing FLOOR or ROUND function based on the first number after decimal point.
Then use DENSE_RANK function for giving rank based on the rounded value.
Query
select z.[Value], dense_rank() over(order by z.[val_rounded] desc) as [Rank] from(
select t.[Value],
case when substring(t.[Value2], charindex('.', t.[Value2], 1) + 1, 1) > 5
then round(t.[Value], 0) else floor(t.[Value]) end as [val_rounded] from(
select [Value], cast((cast([Value]as decimal(6, 2))) as varchar(50)) as [Value2]
from [your_table_name]
)t
)z;
Demo