Group rows by an incrementing column in PostgreSQL - sql

I have the table shown below, with only one column. What I want to achieve is to separate all rows that have no gap in x, for example the numbers 1-3, 5-6 and 8-9 (because the gaps are 4 and 7).
+---+
| x |
+---+
| 1 |
| 2 |
| 3 |
| 5 |
| 6 |
| 8 |
| 9 |
+---+
I would like to make it look like this: a table with two columns (a and b), indicating the ranges where there are no gaps in the previous column x. For every gap a new record is inserted. How would I go about it in PostgreSQL?
+---+---+
| a | b |
+---+---+
| 1 | 3 |
| 5 | 6 |
| 8 | 9 |
+---+---+

You can compare the sequence with gaps to a sequence without gaps:
select min(x), max(x)
from
(
select x,
x-row_number() over (order by x) as dummy
from tab
) as dt
group by dummy
x | row_number | x - row_number
| 1 | 1 | 0 -- same value for consecutive values without gaps
| 2 | 2 | 0
| 3 | 3 | 0
| 5 | 4 | 1
| 6 | 5 | 1
| 8 | 6 | 2
| 9 | 7 | 2

Related

Rolling Average in sqlite

I want to calculate a rolling average in a table and keep track of the starting time of each calculated window frame.
My problem is, that I expect result count reduced compared of the rows in the table. But my query retuns the exact same row number. I think I understand why it does not work, but I don't know the remedy.
Let's say I have a table with example data that looks like this:
+------+-------+
| Tick | Value |
+------+-------+
| 1 | 1 |
| 2 | 3 |_
| 3 | 5 |
| 4 | 7 |_
| 5 | 9 |
| 6 | 11 |_
| 7 | 13 |
| 8 | 15 |_
| 9 | 17 |
| 10 | 19 |_
+------+-------+
I want to calculate the average of every nth item, for example of two rows (see marks above) so that I get an result of:
+--------------+--------------+
| OccurredTick | ValueAverage |
+--------------+--------------+
| 1 | 2 |
| 3 | 6 |
| 5 | 10 |
| 7 | 14 |
| 9 | 18 |
+--------------+--------------+
I tried that with
SELECT
FIRST_VALUE(Tick) OVER (
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
) OccurredTick,
AVG(Value) OVER (
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
) ValueAverage
FROM TableName;
What I get in return is:
+--------------+--------------+
| OccurredTick | ValueAverage |
+--------------+--------------+
| 1 | 2 |
| 2 | 4 |
| 3 | 6 |
| 4 | 8 |
| 5 | 10 |
| 6 | 12 |
| 7 | 14 |
| 8 | 16 |
| 9 | 18 |
| 10 | 19 |
+--------------+--------------+
You could use aggregation. If tick is always increasing with no gaps:
select min(tick), avg(value) avg_value
from mytable
group by cast((tick - 1) / 2 as integer)
You can change 2 to whatever group size suits to best.
If tick are not sequentially increasing, we can generate a sequence with row_number()
select min(tick), avg(value) avg_value
from (
select t.*, row_number() over(order by tick) rn
from mytable t
) t
group by cast((rn - 1) / 2 as integer)

hive - split a row into multiple rows between the range of values

I have a table below and would like to split the rows by the range from start to end columns.
i.e id and value should repeat for each value between start & end(both inclusive)
--------------------------------------
id | value | start | end
--------------------------------------
1 | 5 | 1 | 4
2 | 8 | 5 | 9
--------------------------------------
Desired output
--------------------------------------
id | value | current
--------------------------------------
1 | 5 | 1
1 | 5 | 2
1 | 5 | 3
1 | 5 | 4
2 | 8 | 5
2 | 8 | 6
2 | 8 | 7
2 | 8 | 8
2 | 8 | 9
--------------------------------------
I can write my own UDF in java/python to get this result but would like to check if I can implement in Hive SQL using any existing hive UDFs
Thanks in advance.
This can be accomplished with a recursive common table expression, which Hive doesn't support.
One option is to create a table of numbers and use it to generate rows between start and end.
create table numbers
location 'hdfs_location' as
select row_number() over(order by somecolumn) as num
from some_table --this can be any table with the desired number of rows
;
--Join it with the existing table
select t.id,t.value,n.num as current
from tbl t
join numbers n on n.num>=t.start and n.num<=t.end
You can do using posexplode() UDF.
WITH
data AS (
SELECT 1 AS id, 5 AS value, 1 AS start, 4 AS `end`
UNION ALL
SELECT 2 AS id, 8 AS value, 5 AS start, 9 AS `end`
)
SELECT distinct id, value, (zr.start+rge.diff) as `current`
FROM data zr LATERAL VIEW posexplode(split(space(zr.`end`-zr.start),' ')) rge as diff, x
Here is its Output:
+-----+--------+----------+--+
| id | value | current |
+-----+--------+----------+--+
| 1 | 5 | 1 |
| 1 | 5 | 2 |
| 1 | 5 | 3 |
| 1 | 5 | 4 |
| 2 | 8 | 5 |
| 2 | 8 | 6 |
| 2 | 8 | 7 |
| 2 | 8 | 8 |
| 2 | 8 | 9 |
+-----+--------+----------+--+

Limit a sorted number of rows joined

I have two tables, A and B, and a join table M. I want to, for each A.id, get the top 2 B.id's sorting on the value in table M, producing the results below. This is running on an Azure SQL database
Table A Table M Table B
+-----+ +-----+-----+-------+ +-----+
| Id | | AId | BId | Value | | Id |
+-----+ +-----+-----+-------+ +-----+
| 1 | | 1 | 3 | 4 | | 1 |
| 2 | | 1 | 2 | 3 | | 2 |
| 3 | | 3 | 2 | 3 | | 3 |
| 4 | | 3 | 5 | 6 | | 4 |
+-----+ | 3 | 3 | 4 | | 5 |
| 4 | 1 | 2 | +-----+
| 4 | 2 | 1 |
| 4 | 4 | 3 |
+-----+-----+-------+
Result
+-----+-----+-------+
| AId | BId | Value |
+-----+-----+-------+
| 1 | 3 | 4 |
| 1 | 2 | 3 |
| 3 | 5 | 6 |
| 3 | 3 | 4 |
| 4 | 1 | 2 |
| 4 | 4 | 3 |
+-----+-----+-------+
I know that I can select all the M.AId rows where they equal 1, sort it, and limit by 2, but I need to do this for every row in Table A. I've made an attempt to use group by, but I wasn't sure how to sort and limit it. I've also tried to search for resources associated with this issue but I couldn't find any resources.
(I also wasn't sure how to word the title for this issue)
You can just use ROW_NUMBER:
SELECT
AId, BId, Value
FROM (
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY AId ORDER BY Value DESC)
FROM M
) t
WHERE Rn <= 2

Select distinct combinations of values

I have a table with X values and Y values, both INT. What I want to do is group on the X value with the condition that it contains a distinct combination of Y values. I also want to see the total number of each combination.
I tried using SUM ( POWER (2, Y)), but that generates numbers that are too big as Y can get up to about 300 in some cases.
+--------------+--------------+
| X | Y |
+--------------+--------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 4 |
| 1 | 6 |
| 2 | 1 |
| 2 | 2 |
| 2 | 4 |
| 2 | 6 |
| 3 | 2 |
| 3 | 3 |
| 3 | 5 |
| 4 | 2 |
| 4 | 3 |
| 4 | 5 |
| 5 | 2 |
| 5 | 3 |
| 5 | 6 |
+--------------+--------------+
I want the result to look something like:
+--------------+--------------+
| X | COUNT |
+--------------+--------------+
| 1 | 2 |
| 3 | 2 |
| 5 | 1 |
+--------------+--------------+
Based on your description (but not on your sample data) next query should do:
select X, count(distinct Y)
from TBL
group by X
Thanks for trying to help. I realize that it might have been hard to understand what I was trying to do.
Anyway, I ended up solving it with the checksum_agg aggregate function.

Relational Algebra - Divide

If I have the following tables and I perform R1/R2 in relational algebra, would the result be a table with A values 1 and 3? I am a bit confused as I know 3 would be a result as it contains both 5 and 1, but the result 1 has additional values for B aside from the matching ones so would this also be included and why?
R1 R2
+---+---+ +---+
| A | B | | B |
|---|---| |---|
| 1 | 1 | | 5 |
| 1 | 2 | | 1 |
| 1 | 3 | +---+
| 1 | 4 |
| 2 | 3 |
| 2 | 4 |
| 3 | 5 |
| 3 | 1 |
| 1 | 5 |
| 5 | 7 |
| 5 | 8 |
+---+---+
In relational databases Divide is defined as:
R1(Y,X) DIVIDE R2(X) = R1[Y] MINUS ((R1[Y] TIMES R2) MINUS R1)[Y]
remember that R1[Y] is another form of "PROJECT R1 over Y".
so the result is {1,3}