Replicate constant output based on the occurrance of specific events - sql

I have a table with events (say X, Y, Z are random events and A, B are the ones I want to track). If I find event A, I want to output 1 on the current and following rows and if I find B I output -1 on the current and following rows, before I find any of them (A or B) I output 0. How do I do that using Hive (SQL)?
event | output | ordercol
X 0 1
Y 0 2
Z 0 3
B -1 4
X -1 5
X -1 6
B -1 7
X -1 8
A 1 9
X 1 10
B -1 11
Z -1 12
I know this could be accomplished using joins but I'm looking for a more elegant solution (maybe using Window Functions - I've tried dense_rank() and row_count() with no success)

According to this documentation, you can use first_value() and some additional logic:
select event,
(case first_value(case when event in ('A', 'B') then event end, true) over
(order by ordercol desc)
when 'A' then -1
when 'B' then 1
else 0
end)
from e;
This capability is called IGNORE NULLS in the standard and in other databases.

Related

In Flink, how to convert the cumulative value into an incremental value in flink and then aggregate by some keys

How to convert the cumulative value into an incremental value in flink (some keys are considered to be a user, and then the cumulative value becomes the incremental value of two adjacent ones), and then on the basis of the incremental value (time Dimension, a key) for aggregation (sum)
For example, origin data is:
time A B value
0 1 1 1
0 2 2 2
0 1 1 4
0 2 2 3
1 1 1 5
1 2 2 6
After convert to incremental value, we got
time A B value
0 1 1 1
0 2 2 2
0 1 1 3
0 2 2 1
1 1 1 2
1 2 2 3
Then we aggregate by (time, A), got final result is
time A value
0 1 4
0 2 3
1 1 2
1 2 3
Is there a program that can do these two things at once?
One solution is to use session window or global window to convert the original table into an incremental table and store it in another place, and start another task to aggregate the results? But this will consume additional storage.
Sorry for my poor english and thanks for your advice.
There's no need to have two separate applications, or to store anything. Just let the output of the first step flow into the second step. Conceptually that's
results = input
.somehowDoTheIncrementalPart()
.thenAggregate();
or in SQL you could use a nested query, something like
SELECT ts, sum(diff) FROM (
SELECT ts, userId, diff
FROM events
MATCH_RECOGNIZE (
PARTITION BY id
ORDER BY ts
MEASURES
p2.v - p1.v AS diff, p2.id AS userId, p2.ts AS ts
AFTER MATCH SKIP TO LAST p2
PATTERN (p1 p2)
DEFINE p1 AS TRUE, p2 AS TRUE )
) GROUP BY ts, userId

Fast inclusion of the previous value of another type

I have a table of the following structure:
Ordinal Type
1 A
2 B
3 A
4 B
5 B
6 B
7 A
There are two types and the order according to the ordinal matters. I want the following result:
Ordinal Type Last_A
1 A 1
2 B 1
3 A 3
4 B 3
5 B 3
6 B 3
7 A 7
The new column Last_A should contain the last seen Ordinal for which the Type = A, where last is relative to the order of the Ordinal. There may be an arbitrary number of B-rows before another A-row. Is there a performance efficient way of achieving this result? Using a cursor would easily achieve the desired result, but is not feasible due to the large amount of rows I work with.
You can use a conditional cumulative max():
select t.*,
max(case when t.type = 'A' then ordinal end) over (order by ordinal) as last_A
from t;

convert cell value to respective column in PostgreSQL

Here's the sample:
select * from tmp
--output
A B Value
---------------------
a x 1
b x 2
a y 3
b y 4
c y 5
After a SQL command grouping on B column, I'd like to make each value of column A to be a separate column as illustrated below:
B a b c
----------------------------
x 1 2 null
y 3 4 5
If there any specific terminology for this transformation? Thanks!
You need to find max of other value and group it by with anchor column(b in your case). Please note that your column count should be similar to number of values expected in field A.
select b,
max(case when A='a' then Value else null end)a,
max(case when A='b' then Value else null end)b,
max(case when A='c' then Value else null end)c
from tmp
group by 1

Split a column into two columns based on filter value in query

Is it possible to split a column into the two columns when executing query based on filter value.
For example, given database schema
x varchar
a integer
b 0,1
I can execute a query
select x , sum(a) as asum from t group by x
That will end up with
x | asum
---+-----
...| ...
Now I want to calc distinct sum for a if b is equal to 0 and 1
So the result set should be
x | as for x = 0 | as for x = 1
---+--------------+---------------
...| ... | ...
Of cause I can execute
select x , sum(a) as asum, b as as from t group by x, b
But that will require additional application side transformation
In most databases, you can do conditional aggregation:
select x,
sum(case when b = 0 then a else 0 end) as a_0,
sum(case when b = 1 then a else 0 end) as a_1
from t
group by x;
Voltdb appears to support case (here), so this should work.

PLSQL or SSRS, How to select having all values in a group?

I have a table like this.
ID NAME VALUE
______________
1 A X
2 A Y
3 A Z
4 B X
5 B Y
6 C X
7 C Z
8 D Z
9 E X
And the query:
SELECT * FROM TABLE1 T WHERE T.VALUE IN (X,Z)
This query gives me
ID NAME VALUE
______________
1 A X
3 A Z
4 B X
6 C X
7 C Z
8 D Z
9 E X
But i want to see all values of names which have all params. So, only A and C have both X and Z values, and my desired result is:
ID NAME VALUE
______________
1 A X
2 A Y
3 A Z
6 C X
7 C Z
How can I get the desired result? No matter with sql or with reporting service. Maybe "GROUP BY ..... HAVING" clause will help, but I'm not sure.
By the way I dont know how many params will be in the list.
I realy appreciate any help.
The standard approach would be something like
SELECT id, name, value
FROM table1 a
WHERE name IN (SELECT name
FROM table1 b
WHERE b.value in (x,y)
GROUP BY name
HAVING COUNT(distinct value) = 2)
That would require that you determine how many values are in the list so that you can use a 2 in the HAVING clause if there are 2 elements, a 5 if there are 5 elements, etc. You could also use analytic functions
SELECT id, name, value
FROM (SELECT id,
name,
value,
count(distinct value) over (partition by name) cnt
FROM table1 t1
WHERE t1.value in (x,y))
WHERE cnt = 2
I prefer to structure these "sets within sets" of queries as an aggregatino. I find this is the most flexible approach:
select t.*
from t
where t.name in (select name
from t
group by name
having sum(case when value = 'X' then 1 else 0 end) > 0 and
sum9case when value = 'Y' then 1 else 0 end) > 0
)
The subquery for the in finds all names that have at least one X value and one Y value. Using the same logic, it is easy to adjust for other conditions (X and Y and Z,; X and Y but not Z and so on). The outer query just returns all the rows instead of the names.