I would like the given prices to make a ZONE - sql

1834.14
1834.00
1831.72
1828.61
1828.34
1825.70
1814.09
1813.84
1813.74
1803.58
1802.84
1797.87
1797.30
1795.70
I would like to make a ZONE of the above prices where every value closer to each other with difference of <=3.00
sample result:

Here is one way, using the LEAD() analytic function:
WITH cte AS (
SELECT *, CASE WHEN val - LEAD(val) OVER (ORDER BY val DESC) > 3
THEN 1 ELSE 0 END AS label
FROM yourTable
),
cte2 AS (
SELECT val, SUM(label) OVER (ORDER BY val DESC) AS zone
FROM cte
)
SELECT MIN(val) AS min_val, MAX(val) AS max_val, zone
FROM cte2
GROUP BY zone
ORDER BY MAX(val) DESC;

Related

conditional running sum

I'm trying to return the number of unique users that converted over time.
So I have the following query:
WITH CTE
As
(
SELECT '2020-04-01' as date,'userA' as user,1 as goals Union all
SELECT '2020-04-01','userB',0 Union all
SELECT '2020-04-01','userC',0 Union all
SELECT '2020-04-03','userA',1 Union all
SELECT '2020-04-05','userC',1 Union all
SELECT '2020-04-06','userC',0 Union all
SELECT '2020-04-06','userB',0
)
select
date,
COUNT(DISTINCT
IF
(goals >= 1,
user,
NULL)) AS cad_converters
from CTE
group by date
I'm trying to count distinct user but I need to find a way to apply the distinct count to the whole date. I probably need to do something like a cumulative some...
expected result would be something like this
date, goals, total_unique_converted_users
'2020-04-01',1,1
'2020-04-01',0,1
'2020-04-01',0,1
'2020-04-03',1,2
'2020-04-05',1,2
'2020-04-06',0,2
'2020-04-06',0,2
Below is for BigQuery Standard SQL
#standardSQL
SELECT t.date, t.goals, total_unique_converted_users
FROM `project.dataset.table` t
LEFT JOIN (
SELECT a.date,
COUNT(DISTINCT IF(b.goals >= 1, b.user, NULL)) AS total_unique_converted_users
FROM `project.dataset.table` a
CROSS JOIN `project.dataset.table` b
WHERE a.date >= b.date
GROUP BY a.date
)
USING(date)
I would approach this by tagging when the first goal is scored for each name. Then simply do a cumulative sum:
select cte.* except (seqnum), countif(seqnum = 1) over (order by date)
from (select cte.*,
(case when goals = 1 then row_number() over (partition by user, goals order by date) end) as seqnum
from cte
) cte;
I realize this can be expressed without the case in the subquery:
select cte.* except (seqnum), countif(seqnum = 1 and goals = 1) over (order by date)
from (select cte.*,
row_number() over (partition by user, goals order by date) as seqnum
from cte
) cte;

Finding max row_number inside a window function

I'm trying to calculate an Exponential Moving Average of 3 periods without the use of any loops. I got the math down and in order to calculate it, I have to do something like:
EMA(t) = SUM( Value(t) * K * (1 - K) ^ (n - t) )
Where EMA(t) is the moving average, n is the number of items to sum, t is the item and K is a constant.
So, I tried something like this in T-SQL.
select EMA03 = SUM( xValue * (0.5) * POWER( 0.5, MAX(rn) - rn ) ) OVER ( PARTITION BY nClient ORDER BY myDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW )
from ( select myDate
, xValue
, nClient
, rn = ROW_NUMBER() OVER ( PARTITION BY nClient ORDER BY myDate )
from myTable ) A
But the problem is I can't use MAX(rn) inside a window function already. I have to somehow figure out how many rows the over clause contains and use it on my function. Is there any way to do it?
How about defining the count in the subquery?
select EMA03 = SUM( xValue * (0.5) * POWER( 0.5, cnt - rn ) ) OVER
( PARTITION BY nClient
ORDER BY myDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
from (select myDate, xValue, nClient
ROW_NUMBER() OVER (PARTITION BY nClient ORDER BY myDate) as rn,
count(*) over (partition by nClient) as cnt
from myTable
) A
Try this if it works for you
select EMA03 = SUM( xValue * (0.5)
* POWER( 0.5, (select
count(distinct *) from myTable
where nClient=A.nClient) x
group by nClient
) - x ) )
OVER
( PARTITION BY nClient ORDER BY
myDate ROWS BETWEEN
UNBOUNDED PRECEDING AND
CURRENT ROW )
from ( select myDate
, xValue
, nClient
, rn = ROW_NUMBER() OVER (
PARTITION BY nClient ORDER BY
myDate )
from myTable ) A
You probably just need to add another layer of sub-query.
And while at it, let's use CTE's for readability sake.
WITH CTE1 AS
(
SELECT myDate, xValue, nClient
, rn = ROW_NUMBER() OVER (PARTITION BY nClient ORDER BY myDate)
FROM myTable
),
CTE2 AS
(
SELECT c.*, max_rn = MAX(rn) OVER ()
FROM CTE1 c
)
SELECT c.*
, EMA03 = SUM(xValue * 0.5 * POWER(0.5, max_rn - rn)) OVER (PARTITION BY nClient ORDER BY myDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM CTE2 c;

Calculate percent changes in contiguous ranges in Postgresql

I need to calculate price percent change in contiguous ranges. For example if price start moving up or down and I have sequence of decreasing or increasing values I need to grab first and last value of that sequence and calculate the change.
I'm using window lag function to calculate direction, my problem- I can't generate unique RANK for the sequences to calculate percent changes.
I tired combination of RANK, ROW_NUMBER, etc. with no luck.
Here's my query
WITH partitioned AS (
SELECT
*,
lag(price, 1) over(ORDER BY time) AS lag_price
FROM prices
),
sequenced AS (
SELECT
*,
CASE
WHEN price > lag_price THEN 'up'
WHEN price < lag_price THEN 'down'
ELSE 'equal'
END
AS direction
FROM partitioned
),
ranked AS (
SELECT
*,
-- Here's is the problem
-- I need to calculate unique rnk value for specific sequence
DENSE_RANK() OVER ( PARTITION BY direction ORDER BY time) + ROW_NUMBER() OVER ( ORDER BY time DESC) AS rnk
-- DENSE_RANK() OVER ( PARTITION BY seq ORDER BY time),
-- ROW_NUMBER() OVER ( ORDER BY seq, time DESC),
-- ROW_NUMBER() OVER ( ORDER BY seq),
-- RANK() OVER ( ORDER BY seq)
FROM sequenced
),
changed AS (
SELECT *,
FIRST_VALUE(price) OVER(PARTITION BY rnk ) first_price,
LAST_VALUE(price) OVER(PARTITION BY rnk ) last_price,
(LAST_VALUE(price) OVER(PARTITION BY rnk ) / FIRST_VALUE(price) OVER(PARTITION BY rnk ) - 1) * 100 AS percent_change
FROM ranked
)
SELECT
*
FROM changed
ORDER BY time DESC;
and SQLFiddle with sample data
If anyone interested here's solution, form another forum:
with ct1 as /* detecting direction: up, down, equal */
(
select
price, time,
case
when lag(price) over (order by time) < price then 'down'
when lag(price) over (order by time) > price then 'up'
else 'equal'
end as dir
from
prices
)
, ct2 as /* setting reset points */
(
select
price, time, dir,
case
when coalesce(lag(dir) over (order by time), 'none') <> dir
then 1 else 0
end as rst
from
ct1
)
, ct3 as /* making groups */
(
select
price, time, dir,
sum(rst) over (order by time) as grp
from
ct2
)
select /* calculates min, max price per group */
price, time, dir,
min(price) over (partition by grp) as min_price,
max(price) over (partition by grp) as max_price
from
ct3
order by
time desc;

Max dates for each sequence within partitions

I would like to see if somebody has an idea how to get the max and min dates within each 'id' using the 'row_num' column as an indicator when the sequence starts/ends in SQL Server 2016.
The screenshot below shows the desired output in columns 'min_date' and 'max_date'.
Any help would be appreciated.
You could use windowed MIN/MAX:
WITH cte AS (
SELECT *,SUM(CASE WHEN row_num > 1 THEN 0 ELSE 1 END)
OVER(PARTITION BY id, cat ORDER BY date_col) AS grp
FROM tab
)
SELECT *, MIN(date_col) OVER(PARTITION BY id, cat, grp) AS min_date,
MAX(date_col) OVER(PARTITION BY id, cat, grp) AS max_date
FROM cte
ORDER BY id, date_col, cat;
Rextester Demo
Try something like
SELECT
Q1.id, Q1.cat,
MIN(Q1.date) AS min_dat,
MAX(Q1.date) AS max_dat
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id, cat ORDER BY [date]) AS r1,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY [date]) AS r2
) AS Q1
GROUP BY
Q1.id, Q1.r2 - Q1.r1

Hive transformation

I am trying to make a simple hive transformation.
Can some one provide me a way to do this? I have tried collect_set and currently looking at klout's open source UDF.
I think this gives you what you want. I wasn't able to run it and debug it though. Good luck!
select start_point.unit
, start_time as start
, start_time + min(stop_time - start_time) as stop
from
(select * from
(select date_time as start_time
, unit
, last_value(unit) over (order by date_time row desc between current row and 1 following) as previous_unit
from table
) previous
where unit <> previous_unit
) start_points
left outer join
(select * from
(select date_time as stop_time
, unit
, last_value(unit) over (order by date_time row between current row and 1 following) as next_unit
from table
) next
where unit <> next_unit
) stop_points
on start_points.unit = stop_points.unit
where stop_time > start_time
group by start_point.unit, start_time
;
What about using the min and max functions? I think the following will get you what you need:
SELECT
Unit,
MIN(datetime) as start,
MAX(datetime) as stop
from table_name
group by Unit
;
I found it. Thanks for the pointer to use window functions
select *
from
(select *,
case when lag(unit,1) over (partition by id order by effective_time_ut desc) is NULL THEN 1
when unit<>lag(unit,1) over (partition by id order by effective_time_ut desc) then 1
when lead(unit,1) over (partition by id order by effective_time_ut desc) is NULL then 1
else 0 end as different_loc
from units_we_care) a
where different_loc=1
create table temptable as select unit, start_date, end_time, row_number () over() as row_num from (select unit, min(date_time) start_date, max(date_time) as end_time from table group by unit) a;
select a.unit, a.start_date as start_date, nvl(b.start_date, a.end_time) end_time from temptable a left outer join temptable b on (a.row_num+1) = b.row_num;