I've been trying to find an answer to this for the better part of a day with no luck.
I have a SQL table with measurement data for samples and I need a way to assign a unique ID to each sample. Right now each sample has an ID number that rolls over frequently. What I need is a unique ID for each sample. Below is a table with a simplified dataset, as well as an example of a possible UID that would do what I need.
| Row | Time | Meas# | Sample# | UID (Desired) |
| 1 | 09:00 | 1 | 1 | 1 |
| 2 | 09:01 | 2 | 1 | 1 |
| 3 | 09:02 | 3 | 1 | 1 |
| 4 | 09:07 | 1 | 2 | 2 |
| 5 | 09:08 | 2 | 2 | 2 |
| 6 | 09:09 | 3 | 2 | 2 |
| 7 | 09:24 | 1 | 3 | 3 |
| 8 | 09:25 | 2 | 3 | 3 |
| 9 | 09:25 | 3 | 3 | 3 |
| 10 | 09:47 | 1 | 1 | 4 |
| 11 | 09:47 | 2 | 1 | 4 |
| 12 | 09:49 | 3 | 1 | 4 |
My problem is that rows 10-12 have the same Sample# as rows 1-3. I need a way to uniquely identify and group each sample. Having the row number or time of the first measurement on the sample would be good.
One other complication is that the measurement number doesn't always start with 1. It's based on measurement locations, and sometimes it skips location 1 and only has locations 2 and 3.
I am going to speculate that you want a unique number assigned to each sample, where now you have repeats.
If so, you can use lag() and a cumulative sum:
select t.*,
sum(case when prev_sample = sample then 0 else 1 end) over (order by row) as new_sample_number
from (select t.*,
lag(sample) over (order by row) as prev_sample
from t
) t;
I have a table below and would like to split the rows by the range from start to end columns.
i.e id and value should repeat for each value between start & end(both inclusive)
--------------------------------------
id | value | start | end
--------------------------------------
1 | 5 | 1 | 4
2 | 8 | 5 | 9
--------------------------------------
Desired output
--------------------------------------
id | value | current
--------------------------------------
1 | 5 | 1
1 | 5 | 2
1 | 5 | 3
1 | 5 | 4
2 | 8 | 5
2 | 8 | 6
2 | 8 | 7
2 | 8 | 8
2 | 8 | 9
--------------------------------------
I can write my own UDF in java/python to get this result but would like to check if I can implement in Hive SQL using any existing hive UDFs
Thanks in advance.
This can be accomplished with a recursive common table expression, which Hive doesn't support.
One option is to create a table of numbers and use it to generate rows between start and end.
create table numbers
location 'hdfs_location' as
select row_number() over(order by somecolumn) as num
from some_table --this can be any table with the desired number of rows
;
--Join it with the existing table
select t.id,t.value,n.num as current
from tbl t
join numbers n on n.num>=t.start and n.num<=t.end
You can do using posexplode() UDF.
WITH
data AS (
SELECT 1 AS id, 5 AS value, 1 AS start, 4 AS `end`
UNION ALL
SELECT 2 AS id, 8 AS value, 5 AS start, 9 AS `end`
)
SELECT distinct id, value, (zr.start+rge.diff) as `current`
FROM data zr LATERAL VIEW posexplode(split(space(zr.`end`-zr.start),' ')) rge as diff, x
Here is its Output:
+-----+--------+----------+--+
| id | value | current |
+-----+--------+----------+--+
| 1 | 5 | 1 |
| 1 | 5 | 2 |
| 1 | 5 | 3 |
| 1 | 5 | 4 |
| 2 | 8 | 5 |
| 2 | 8 | 6 |
| 2 | 8 | 7 |
| 2 | 8 | 8 |
| 2 | 8 | 9 |
+-----+--------+----------+--+
I've got a table like this
|week_no|value|attribute|
-------------------------
| 1 | 3 | a |
| 2 | 3 | a |
| 3 | 3 | a |
| 1 | 4 | b |
| 2 | 4 | b |
| 3 | 4 | b |
I'd like to have an accumulative account of the value column
|week_no|value|attribute|accum_value|
-------------------------------------
| 1 | 3 | a | 3 |
| 2 | 3 | a | 6 |
| 3 | 3 | a | 9 |
| 1 | 4 | b | 4 |
| 2 | 4 | b | 8 |
| 3 | 4 | b | 12 |
I've attempted doing the above by using this windowing function though it doesn't seem to be returning the correct values
SUM(value) OVER(ORDER BY 1 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS accum_value
The correct window function would use partition by:
SUM(value) OVER (PARTITION BY attribute ORDER BY week_no
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS accum_value
thank you for attention.
I have a table called "PROD_COST" with 5 fields
(ID,Duration,Cost,COST_NEXT,COST_CHANGE).
I need extra field called "groups" for aggregation.
Duration = number of days the price is valid (1 day=1row).
Cost = product price in this day.
Cost_next = lead(cost,1,0).
Cost_change = Cost_next - Cost.
Now i need to group by Cost_change. It can be
positive,negative or 0 values.
+----+---+------+------+------+
| 1 | 1 | 10 | 8,5 | -1,5 |
| 2 | 1 | 8,5 | 12,2 | 3,7 |
| 3 | 1 | 12,2 | 5,3 | -6,9 |
| 4 | 1 | 5,3 | 4,2 | 1,2 |
| 5 | 1 | 4,2 | 6,2 | 2 |
| 6 | 1 | 6,2 | 9,2 | 3 |
| 7 | 1 | 9,2 | 7,5 | -2,7 |
| 8 | 1 | 7,5 | 6,2 | -1,3 |
| 9 | 1 | 6,2 | 6,3 | 0,1 |
| 10 | 1 | 6,3 | 7,2 | 0,9 |
| 11 | 1 | 7,2 | 7,5 | 0,3 |
| 12 | 1 | 7,5 | 0 | 7,5 |
+----+---+------+------+------+`
I need to make a query, which will group it by first negative or positive value (+ - + - + -). Last one field is what i want.
Sorry for my English `
+----+---+------+------+------+---+
| 1 | 1 | 10 | 8,5 | -1,5 | 1 |
| 2 | 1 | 8,5 | 12,2 | 3,7 | 2 |
| 3 | 1 | 12,2 | 5,3 | -6,9 | 3 |
| 4 | 1 | 5,3 | 4,2 | 1,2 | 4 |
| 5 | 1 | 4,2 | 6,2 | 2 | 4 |
| 6 | 1 | 6,2 | 9,2 | 3 | 4 |
| 7 | 1 | 9,2 | 7,5 | -2,7 | 5 |
| 8 | 1 | 7,5 | 6,2 | -1,3 | 5 |
| 9 | 1 | 6,2 | 6,3 | 0,1 | 6 |
| 10 | 1 | 6,3 | 7,2 | 0,9 | 6 |
| 11 | 1 | 7,2 | 7,5 | 0,3 | 6 |
| 12 | 1 | 7,5 | 0 | 7,5 | 6 |
+----+---+------+------+------+---+`
If you're in SQL Server 2012 you can use the window functions to do this:
select
id, COST_CHANGE, sum(GRP) over (order by id asc) +1
from
(
select
*,
case when sign(COST_CHANGE) != sign(isnull(lag(COST_CHANGE)
over (order by id asc),COST_CHANGE)) then 1 else 0 end as GRP
from
PROD_COST
) X
Lag will get the value from previous row, check the sign of it and compare it to the current row. If the values don't match, the case will return 1. The outer select will do a running total of these numbers, and every time there is 1, it will increase the total.
It is possible to use the same logic with older versions too, you'll just have to fetch the previous row from the table using the id and do running total by re-calculating all rows before the current one.
Example in SQL Fiddle
James's answer is close but it doesn't handle the zero value correctly. This is a pretty easy modification. One tricky approximation uses differences between product changes:
select id, COST_CHANGE, sum(IsNewGroup) over (order by id asc) +1
from (select pc.*,
(case when sign(cost_change) - sign(lag(cost_change) over (order by id)) between -1 and 1
then 0
else 1 -- `NULL` intentionally goes here
end) IsNewGroup
from Prod_Cost
) pc
For clarity, here is a SQL Fiddle with zero values. From my understanding of the question, the OP only wants an actual sign change.
This may still not be correct. The OP simply is not clear about what to do about 0 values.
Iam experiencing an issue in oracle analytic functions
I want the rank in oracle to be displayed sequentialy but require a cyclic fashion.But this ranking should happen within a group.
Say I have 10 groups
In 10 groups each group must be ranked in till 9. If greater than 9 the rank value must start again from 1 and then end till howmuch so ever
emp id date1 date 2 Rank
123 13/6/2012 13/8/2021 1
123 14/2/2012 12/8/2014 2
.
.
123 9/10/2013 12/12/2015 9
123 16/10/2013 15/10/2013 1
123 16/3/2014 15/9/2015 2
In the above example the for the group of rows of the empid 123 i have split the rank in two subgroup fashion.Sequentially from 1 to 9 is one group and for the rest of the rows the rank again starts from 1.How to achieve this in oracle rank functions.
as per suggestion from Egor Skriptunoff above:
select
empid, date1, date2
, row_number() over(order by date1, date2) as "rank"
, mod(row_number() over(order by date1, date2)-1, 9)+1 as "cycle_9"
from yourtable
example result
| empid | date1 | date2 | rn | ranked |
|-------|----------------------|----------------------|----|--------|
| 72232 | 2016-10-26T00:00:00Z | 2017-03-07T00:00:00Z | 1 | 1 |
| 04365 | 2016-11-03T00:00:00Z | 2017-07-29T00:00:00Z | 2 | 2 |
| 79203 | 2016-12-15T00:00:00Z | 2017-05-16T00:00:00Z | 3 | 3 |
| 68638 | 2016-12-18T00:00:00Z | 2017-02-08T00:00:00Z | 4 | 4 |
| 75784 | 2016-12-24T00:00:00Z | 2017-11-18T00:00:00Z | 5 | 5 |
| 72836 | 2016-12-24T00:00:00Z | 2018-09-10T00:00:00Z | 6 | 6 |
| 03679 | 2017-01-24T00:00:00Z | 2017-10-14T00:00:00Z | 7 | 7 |
| 43527 | 2017-02-12T00:00:00Z | 2017-01-15T00:00:00Z | 8 | 8 |
| 03138 | 2017-02-26T00:00:00Z | 2017-01-30T00:00:00Z | 9 | 9 |
| 89758 | 2017-03-29T00:00:00Z | 2018-04-12T00:00:00Z | 10 | 1 |
| 86377 | 2017-04-14T00:00:00Z | 2018-10-07T00:00:00Z | 11 | 2 |
| 49169 | 2017-04-28T00:00:00Z | 2017-04-21T00:00:00Z | 12 | 3 |
| 45523 | 2017-05-03T00:00:00Z | 2017-05-07T00:00:00Z | 13 | 4 |
SQL Fiddle