Running Count by Group and Flag in BigQuery? - sql

I have a table that looks like the below:
Row | Fullvisitorid | Visitid | New_Session_Flag
1 | A | 111 | 1
2 | A | 120 | 0
3 | A | 128 | 0
4 | A | 133 | 0
5 | A | 745 | 1
6 | A | 777 | 0
7 | B | 388 | 1
8 | B | 401 | 0
9 | B | 420 | 0
10 | B | 777 | 1
11 | B | 784 | 0
12 | B | 791 | 0
13 | B | 900 | 1
14 | B | 904 | 0
What I want to do is if it's the first row for a fullvisitorid then mark the field as 1, otherwise use the above row as the value, but if the new_session_flag = 1 then use the above row plus 1, example of output I'm looking for below:
Row | Fullvisitorid | Visitid | New_Session_Flag | Rank_Session_Order
1 | A | 111 | 1 | 1
2 | A | 120 | 0 | 1
3 | A | 128 | 0 | 1
4 | A | 133 | 0 | 1
5 | A | 745 | 1 | 2
6 | A | 777 | 0 | 2
7 | B | 388 | 1 | 1
8 | B | 401 | 0 | 1
9 | B | 420 | 0 | 1
10 | B | 777 | 1 | 2
11 | B | 784 | 0 | 2
12 | B | 791 | 0 | 2
13 | B | 900 | 1 | 3
14 | B | 904 | 0 | 3
As you can see:
Row 1 is 1 because it's the first time fullvisitorid A appears
Row 2 is 1 because it's not the first time fullvisitorid A appears and new_session_flag <> 1 therefore it uses the above row (i.e. 1)
Row 5 is 2 because it's not the first time fullvisitorid A appears and new_session_Flag = 1 therefore it uses the above row (i.e 1) plus 1
Row 7 is 1 because it's the first time fullvisitorid B appears
etc.
I believe this can be done through a retain statement in SAS but is there an equivalent in Google BigQquery?
Hopefully the above makes sense, let me know if not.
Thanks in advance

Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
COUNTIF(New_Session_Flag = 1) OVER(PARTITION BY Fullvisitorid ORDER BY Visitid) Rank_Session_Order
FROM `project.dataset.table`

The answer by Mikhail Berlyant using a conditional window count is corret and works. I am answering because I find that a window sum is even simpler (and possibly more efficient on a large dataset):
select
t.*,
sum(new_session_flag) over(partition by fullvisitorid order by visid_id) rank_session_order
from mytable t
This works because the new_session_flag contains 0s and 1s only; so counting the 1s is actually equivalent to suming all values.

Related

How to minus by period in another column

I need results in minus column like:
For example, we take first result by A = 23(1)
and we 34(2) - 23(1) = 11, then 23(3) - 23(1)...
And so on. For each category.
+--------+----------+--------+-------+
| Period | Category | Result | Minus |
+--------+----------+--------+-------+
| 1 | A | 23 | n/a |
| 1 | B | 24 | n/a |
| 1 | C | 25 | n/a |
| 2 | A | 34 | 11 |
| 2 | B | 23 | -1 |
| 2 | C | 1 | -24 |
| 3 | A | 23 | 0 |
| 3 | B | 90 | 66 |
| 3 | C | 21 | -4 |
+--------+----------+--------+-------+
Could you help me?
Could we use partitions or lead here?
SELECT
*,
Result - FIRST_VALUE(Result) OVER (PARTITION BY Category ORDER BY Period) AS Minus
FROM
yourTable
This doesn't create the hello values, but returns 0 instead. I'm not sure returning arbitrary string in an integer column makes sense, so I didn't do it.
If you really need to avoid the 0 you could just use a CASE statement...
CASE WHEN 1 = Period
THEN NULL
ELSE Result - FIRST_VALUE(Result) OVER (PARTITION BY Category ORDER BY Period)
END
Or, even more robustly...
CASE WHEN 1 = ROW_NUMBER() OVER (PARTITION BY Category ORDER BY Period)
THEN NULL
ELSE Result - FIRST_VALUE(Result) OVER (PARTITION BY Category ORDER BY Period)
END
(Apologies for any typos, etc, I'm on my phone.)
You can do:
select b.*, b.result - a.result as "minus"
from t a
join t b on b.category = a.category and a.period = 1
Result:
period category result minus
------- --------- ------- -----
1 A 23 0
1 B 24 0
1 C 25 0
2 A 34 11
2 B 23 -1
2 C 1 -24
3 A 23 0
3 B 90 66
3 C 21 -4
See running example at DB Fiddle.
Ok, just not to duplicate my question
How to do it?
If
For each new Sub period we should repeat first_value logic
+------------+----------+------------+----------+---------+
| Sub period | Period | Category | Result | Minus |
+------------+----------+------------+----------+---------+
| SA | 1 | A | 23 | n/a |
| SA | 2 | A | 34 | 11 |
| SA | 3 | A | 35 | 12 |
| SA | 4 | A | 36 | 13 |
| KS | 1 | A | 23 | n/a |
| KS | 2 | A | 21 | -2 |
| KS | 3 | A | 23 | 0 |
| KS | 4 | A | 21 | -2 |
+------------+----------+------------+----------+---------+

SQL generate unique ID from rolling ID

I've been trying to find an answer to this for the better part of a day with no luck.
I have a SQL table with measurement data for samples and I need a way to assign a unique ID to each sample. Right now each sample has an ID number that rolls over frequently. What I need is a unique ID for each sample. Below is a table with a simplified dataset, as well as an example of a possible UID that would do what I need.
| Row | Time | Meas# | Sample# | UID (Desired) |
| 1 | 09:00 | 1 | 1 | 1 |
| 2 | 09:01 | 2 | 1 | 1 |
| 3 | 09:02 | 3 | 1 | 1 |
| 4 | 09:07 | 1 | 2 | 2 |
| 5 | 09:08 | 2 | 2 | 2 |
| 6 | 09:09 | 3 | 2 | 2 |
| 7 | 09:24 | 1 | 3 | 3 |
| 8 | 09:25 | 2 | 3 | 3 |
| 9 | 09:25 | 3 | 3 | 3 |
| 10 | 09:47 | 1 | 1 | 4 |
| 11 | 09:47 | 2 | 1 | 4 |
| 12 | 09:49 | 3 | 1 | 4 |
My problem is that rows 10-12 have the same Sample# as rows 1-3. I need a way to uniquely identify and group each sample. Having the row number or time of the first measurement on the sample would be good.
One other complication is that the measurement number doesn't always start with 1. It's based on measurement locations, and sometimes it skips location 1 and only has locations 2 and 3.
I am going to speculate that you want a unique number assigned to each sample, where now you have repeats.
If so, you can use lag() and a cumulative sum:
select t.*,
sum(case when prev_sample = sample then 0 else 1 end) over (order by row) as new_sample_number
from (select t.*,
lag(sample) over (order by row) as prev_sample
from t
) t;

How to select if similar field count is the maximum in the table?

I want to select from a table if row counts of similar filed is maximum depends on other columns.
As example
| user_id | team_id | isOk |
| 1 | 1 | 1 |
| 2 | 1 | 1 |
| 3 | 1 | 1 |
| 4 | 1 | 1 |
| 5 | 2 | 1 |
| 6 | 2 | 1 |
| 7 | 2 | 1 |
| 8 | 3 | 1 |
| 9 | 3 | 1 |
| 10 | 3 | 1 |
| 11 | 3 | 0 |
So i want to select team 1 and 2 because they all have 1 value at isOk Column,
i tried to use this query
SELECT Team
FROM _Table1
WHERE isOk= 1
GROUP BY Team
HAVING COUNT(*) > 3
But still i have to define a row count which can be maximum or not.
Thanks in advance.
Is this what you are looking for?
select team
from _table1
group by team
having min(isOk) = 1;

Alternation of positive and negative values

thank you for attention.
I have a table called "PROD_COST" with 5 fields
(ID,Duration,Cost,COST_NEXT,COST_CHANGE).
I need extra field called "groups" for aggregation.
Duration = number of days the price is valid (1 day=1row).
Cost = product price in this day.
Cost_next = lead(cost,1,0).
Cost_change = Cost_next - Cost.
Now i need to group by Cost_change. It can be
positive,negative or 0 values.
+----+---+------+------+------+
| 1 | 1 | 10 | 8,5 | -1,5 |
| 2 | 1 | 8,5 | 12,2 | 3,7 |
| 3 | 1 | 12,2 | 5,3 | -6,9 |
| 4 | 1 | 5,3 | 4,2 | 1,2 |
| 5 | 1 | 4,2 | 6,2 | 2 |
| 6 | 1 | 6,2 | 9,2 | 3 |
| 7 | 1 | 9,2 | 7,5 | -2,7 |
| 8 | 1 | 7,5 | 6,2 | -1,3 |
| 9 | 1 | 6,2 | 6,3 | 0,1 |
| 10 | 1 | 6,3 | 7,2 | 0,9 |
| 11 | 1 | 7,2 | 7,5 | 0,3 |
| 12 | 1 | 7,5 | 0 | 7,5 |
+----+---+------+------+------+`
I need to make a query, which will group it by first negative or positive value (+ - + - + -). Last one field is what i want.
Sorry for my English `
+----+---+------+------+------+---+
| 1 | 1 | 10 | 8,5 | -1,5 | 1 |
| 2 | 1 | 8,5 | 12,2 | 3,7 | 2 |
| 3 | 1 | 12,2 | 5,3 | -6,9 | 3 |
| 4 | 1 | 5,3 | 4,2 | 1,2 | 4 |
| 5 | 1 | 4,2 | 6,2 | 2 | 4 |
| 6 | 1 | 6,2 | 9,2 | 3 | 4 |
| 7 | 1 | 9,2 | 7,5 | -2,7 | 5 |
| 8 | 1 | 7,5 | 6,2 | -1,3 | 5 |
| 9 | 1 | 6,2 | 6,3 | 0,1 | 6 |
| 10 | 1 | 6,3 | 7,2 | 0,9 | 6 |
| 11 | 1 | 7,2 | 7,5 | 0,3 | 6 |
| 12 | 1 | 7,5 | 0 | 7,5 | 6 |
+----+---+------+------+------+---+`
If you're in SQL Server 2012 you can use the window functions to do this:
select
id, COST_CHANGE, sum(GRP) over (order by id asc) +1
from
(
select
*,
case when sign(COST_CHANGE) != sign(isnull(lag(COST_CHANGE)
over (order by id asc),COST_CHANGE)) then 1 else 0 end as GRP
from
PROD_COST
) X
Lag will get the value from previous row, check the sign of it and compare it to the current row. If the values don't match, the case will return 1. The outer select will do a running total of these numbers, and every time there is 1, it will increase the total.
It is possible to use the same logic with older versions too, you'll just have to fetch the previous row from the table using the id and do running total by re-calculating all rows before the current one.
Example in SQL Fiddle
James's answer is close but it doesn't handle the zero value correctly. This is a pretty easy modification. One tricky approximation uses differences between product changes:
select id, COST_CHANGE, sum(IsNewGroup) over (order by id asc) +1
from (select pc.*,
(case when sign(cost_change) - sign(lag(cost_change) over (order by id)) between -1 and 1
then 0
else 1 -- `NULL` intentionally goes here
end) IsNewGroup
from Prod_Cost
) pc
For clarity, here is a SQL Fiddle with zero values. From my understanding of the question, the OP only wants an actual sign change.
This may still not be correct. The OP simply is not clear about what to do about 0 values.

SQL report show result in one line of group

I am trying to reach the follwoing result:
ID | Part | QTY| Boxes| Reference
1 | ABC123 | 20 | 0 | REF0001
2 | ABC345 | 10 | 0 | REF0001
3 | ABC487 | 5 | 1 | REF0001
4 | SEF453 | 4 | 0 | REF0002
5 | ABDS12 | 82 | 4 | REF0002
6 | EFR488 | 64 | 0 | REF0003
7 | XCV345 | 58 | 0 | REF0003
8 | SSFS33 | 23 | 3 | REF0003
Right now I get
ID | Part | QTY| Boxes| Reference
1 | ABC123 | 20 | 1 | REF0001
2 | ABC345 | 10 | 1 | REF0001
3 | ABC487 | 5 | 1 | REF0001
4 | SEF453 | 4 | 4 | REF0002
5 | ABDS12 | 82 | 4 | REF0002
6 | EFR488 | 64 | 3 | REF0003
7 | XCV345 | 58 | 3 | REF0003
8 | SSFS33 | 23 | 3 | REF0003
As you can see, the qty of boxes per reference repeat each row and i need to appear only one per reference.
Well, here is one way . . .
with t as (<your current query>)
select ID, Part, QTY,
max(Boxes) over (partition by Reference) as Boxes,
Reference
from t
Assigning row numbers grouped per each reference will mark highest ID sharing the same reference as 1; main query checks this mark and outputs zero if it is not satisfied.
; with q as
(
select *,
row_number() over (partition by Reference
order by ID desc) rn
from
(
your-query-here
) a
)
select q.ID,
q.Part,
q.QTY,
case when rn = 1 then q.Boxes else 0 end as Boxes,
q.Reference
from q
order by q.ID