How to check max from range in cursor? - sql

I have a problem with transferring an Excel formula to SQL. My excel formula is: =IF(P2<(MAX($P$2:P2));"Move";"").
The P column in excel is a sequence of numbers.
a | b
------
1
2
7
3 MOVE
4 MOVE
8
9
5 MOVE
10
You can find more example on this screenshot:
I created a cursor with a loop but I don't know how to check max from range.
For example when I iterate for fourth row, I have to check max from 1-4 row etc.

No need for a cursor and a loop. Assuming that you have a column that defines the ordering of the rows (say, id), you can use window functions:
select t.*,
case when a < max(a) over(order by id) then 'MOVE' end as b
from mytable t

One option would be using MAX() Analytic function . But in any case, you'd have an extra column such as id for ordering in order to determine the max value for the current row from the first row, since SQL statements represent unordered sets. If you have that id column with values ordered as in your sample data, then consider using
WITH t2 AS
(
SELECT MAX(a) OVER (ORDER BY id ROWS BETWEEN
UNBOUNDED PRECEDING
AND
CURRENT ROW) AS max_upto_this_row,
t.*
FROM t
)
SELECT a, CASE WHEN max_upto_this_row > a THEN 'Move' END AS b
FROM t2
ORDER BY id;
Demo

Related

Selecting all rows that have a value in the top N rows

Consider this simple table of event counts:
event_name
count
viewLoaded
20
viewUnloaded
17
buttonTapped
12
viewScrolled
12
networkSuccess
9
linkTapped
9
networkFailure
2
leapSecond
0
I would like to select the top N events by count, but with the additional requirement that if the result set includes any event with a particular count, then it should include all of the events with that count. In other words, I don’t want to break up any of the “groups” of rows that have the same count. Instead, I will potentially get more rows than I asked for.
For example, if I wanted the “top five” events in the table above, the query would actually return six rows so that both events with count 9 were included. The query for the top four would return four rows, and the query for the top three would also return four rows.
How can I accomplish this in SQLite?
You can use the RANK window function for this task. It's ranking value will be equal for the identical values, but will consider the amount of past rows when needs to assign the next ranking.
WITH cte AS (
SELECT *, RANK() OVER(ORDER BY count_ DESC) AS rn
FROM events
)
SELECT event_name, count_ FROM cte WHERE rn <= 5
Check the demo here.
One way is to use a common table expression to identify the counts corresponding to the “top five” events:
with top_five as (
select count from events order by count desc limit 5
)
select * from events where count in top_five order by count desc;

SQL: Apply sequence number to a column based on nth occurrence of each distinct value

I have a table with a column of values where each value occurs a variable number of times (i.e., one value may occur 1 time, and another value may occur 3 times). I need to add a column that identifies the occurrence sequence # of its corresponding value.
Input Table
SOURCE_VAL
a
a
b
c
c
c
Output table
SEQUENCE_VAL
SOURCE_VAL
1
a
2
a
1
b
1
c
2
c
3
c
What would the SQL for this be to generate the SEQUENCE_VAL column based on SOURCE_VAL?
You are looking for row_number(). Without an ordering column, you can use:
select t.*,
row_number() over (partition by source_val order by source_val) as sequence_val
from t
order by source_val, sequence_val;
Note: This assumes that you do not care about the ordering of the value. If you have another column that does specify the ordering for each source_val, then use that in the order by.

Windowing function in Hive

I am exploring windowing functions in Hive and I am able to understand the functionalities of all the UDFs. Although, I am not able to understand the partition by and order by that we use with the other functions. Following is the structure that is very similar to the query which I am planning to build.
SELECT a, RANK() OVER(partition by b order by c) as d from xyz;
Just trying to understand the background process involved for both keywords.
Appreciate the help :)
RANK() analytic function assigns a rank to each row in each partition in the dataset.
PARTITION BY clause determines how the rows to be distributed (between reducers if it is hive).
ORDER BY determines how the rows are being sorted in the partition.
First phase is distribute by, all rows in a dataset are distributed into partitions. In map-reduce each mapper groups rows according to the partition by and produces files for each partition. Mapper does initial sorting of partition parts according to the order by.
Second phase, all rows are sorted inside each partition.
In map-reduce, each reducer gets partitions files (parts of partitions) produced by mappers and sorts rows in the whole partition (sort of partial results) according to the order by.
Third, rank function assigns rank to each row in a partition. Rank function is being initialized for each partition.
For the first row in the partition rank starts with 1. For each next row Rank=previous row rank+1. Rows with equal values (specified in the order by) given the same rank, if the two rows share the same rank, next row rank is not consecutive.
Different partitions can be processed in parallel on different reducers. Small partitions can be processed on the same reducer. Rank function re-initializes when it crossing the partition boundary and starts with rank=1 for each partition.
Example (rows are already partitioned and sorted inside partitions):
SELECT a, RANK() OVER(partition by b order by c) as d from xyz;
a, b, c, d(rank)
----------------
1 1 1 1 --starts with 1
2 1 1 1 --the same c value, the same rank=1
3 1 2 3 --rank 2 is skipped because second row shares the same rank as first
4 2 3 1 --New partition starts with 1
5 2 4 2
6 2 5 3
If you need consecutive ranks, use dense_rank function. dense_rank will produce rank=2 for the third row in the above dataset.
row_number function will assign a position number to each row in the partition starting with 1. Rows with equal values will receive different consecutive numbers.
SELECT a, ROW_NUMBER() OVER(partition by b order by c) as d from xyz;
a, b, c, d(row_number)
----------------
1 1 1 1 --starts with 1
2 1 1 2 --the same c value, row number=2
3 1 2 3 --row position=3
4 2 3 1 --New partition starts with 1
5 2 4 2
6 2 5 3
Important note: For rows with the same values row_number or other such analytic function may have non-deterministic behavior and produce different numbers from run to run. First row in the above dataset may receive number 2 and second row may receive number 1 and vice-versa, because their order is not determined unless you will add one more column a to the order by clause. In this case all rows will always have the same row_number from run to run, their order values are different.

How to subtract the content of a column of two rows

I have a table like this
and I want to return the difference between the two rows
SQL tables represent unordered sets. There is no ordering, unless a column specifies the ordering.
So, you can get the two values using MAX() and MIN(). This should do what you want:
select max(nbaction) - min(nbaction)
from t;
EDIT:
Given your actual problem, you have multiple choices. Here is one:
SELECT (SELECT nbaction
FROM analyse_page_fait
WHERE operateurdimid = 2
ORDER BY datedimid DESC
FETCH FIRST 1 ROW ONLY
) -
(SELECT nbaction
FROM analyse_page_fait
WHERE operateurdimid = 2
ORDER BY datedimid DESC
OFFSET 1
FETCH FIRST 1 ROW ONLY
) as diff

Find preceding and following rows for a matching row in BigQuery?

Is it possible to find rows preceding and following a matching rows in a BigQuery query? For example if I do:
select textPayload from logs.logs_20160709 where textPayload like "%something%"
and say that I get these results back:
something A
something B
How can I also show the 3 rows preceding and following the matching rows? Something like this:
some text 1
some text 2
some text 3
something A
some text 4
some text 5
some text 6
some text 90
some text 91
some text 92
something B
some text 93
some text 94
some text 95
Is this possible and if so how?
While on Zuma Beach - I was thinking of avoiding CROSS JOIN in my original answer.
Check below - should be much cheaper especially for big set
SELECT textPayload
FROM (
SELECT textPayload,
SUM(match) OVER(ORDER BY ts ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) AS flag
FROM (
SELECT textPayload, ts, IF(textPayload CONTAINS 'something', 1, 0) AS match
FROM YourTable
)
)
WHERE flag > 0
Of course another way to avoid cross join is to use BigQuery Standard SQL. But still - above solution with no joins at all is better than my original answer
I think, one piece is missing in your example - extra field that will define the order, so I added ts field for this in my answer. This mean I assume your table has two fields involved : textPayload and ts
Try below. Should give you exactly what you need
SELECT
all.textPayload
FROM (
SELECT start, finish
FROM (
SELECT textPayload,
LAG(ts, 3) OVER(ORDER BY ts ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS start,
LEAD(ts, 3) OVER(ORDER BY ts ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING) AS finish
FROM YourTable
)
WHERE textPayload CONTAINS 'something'
) AS matches
CROSS JOIN YourTable AS all
WHERE all.ts BETWEEN matches.start AND matches.finish
Please note: depends on type of your ts field - you might need to do some data casting in query for this field. hope not