I need to aggregate consecutive values in a table with BigQuery, as shown in the example
Segment can be only 'A' or 'B'. Value is a String.
Basically, for each id i need to consider only segment='A' taking into account the gaps.
It should be ORDER BY date_column ASC
Example
id, segment, value, date_column
1, A, 3, daytime
1, A, 2, daytime
1, A, x, daytime
1, B, 3, daytime
1, B, 3, daytime
1, B, 3, daytime
1, A, 7, daytime
1, A, 3, daytime
1, B, 3, daytime
1, A, 9, daytime
1, A, 9, daytime
2, A, 3, daytime
2, B, 3, daytime
2, A, 3, daytime
2, A, m, daytime
Expected result
id, agg_values_A_segment
1, ['32x', '73', '99']
2, ['3', '3m']
How can I achieve this result?
I'm struggling with the 'gap' between the segments.
Below options for BigQuery Standard SQL
Option 1 - using window analytics functions
#standardSQL
SELECT id, ARRAY_AGG(values_in_group ORDER BY grp) agg_values_A_segment
FROM (
SELECT id, grp, STRING_AGG(value, '' ORDER BY date_column) values_in_group
FROM (
SELECT id, segment, value, date_column, flag,
COUNTIF(flag) OVER(PARTITION BY id ORDER BY date_column) grp
FROM (
SELECT *, IFNULL(LAG(segment) OVER(PARTITION BY id ORDER BY date_column), segment) != segment flag
FROM `project.dataset.table`
)
)
WHERE segment = 'A'
GROUP BY id, grp
)
GROUP BY id
You can test, play with above using sample data from your question as in below example:
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 'A' segment, '3' value, DATETIME '2019-01-07T18:46:21' date_column UNION ALL
SELECT 1, 'A', '2', '2019-01-07T18:46:22' UNION ALL
SELECT 1, 'A', 'x', '2019-01-07T18:46:23' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:24' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:25' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:26' UNION ALL
SELECT 1, 'A', '7', '2019-01-07T18:46:27' UNION ALL
SELECT 1, 'A', '3', '2019-01-07T18:46:28' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:29' UNION ALL
SELECT 1, 'A', '9', '2019-01-07T18:46:30' UNION ALL
SELECT 1, 'A', '9', '2019-01-07T18:46:31' UNION ALL
SELECT 2, 'A', '3', '2019-01-07T18:46:32' UNION ALL
SELECT 2, 'B', '3', '2019-01-07T18:46:33' UNION ALL
SELECT 2, 'A', '3', '2019-01-07T18:46:34' UNION ALL
SELECT 2, 'A', 'm', '2019-01-07T18:46:35'
)
SELECT id, ARRAY_AGG(values_in_group ORDER BY grp) agg_values_A_segment
FROM (
SELECT id, grp, STRING_AGG(value, '' ORDER BY date_column) values_in_group
FROM (
SELECT id, segment, value, date_column, flag,
COUNTIF(flag) OVER(PARTITION BY id ORDER BY date_column) grp
FROM (
SELECT *, IFNULL(LAG(segment) OVER(PARTITION BY id ORDER BY date_column), segment) != segment flag
FROM `project.dataset.table`
)
)
WHERE segment = 'A'
GROUP BY id, grp
)
GROUP BY id
-- ORDER BY id
with result
Row id agg_values_A_segment
1 1 32x
73
99
2 2 3
3m
Option 2 - above option should work for big volumes of rows per id, but looks a little heavy - so second option is more of simple option but assumes you have some character or sequence of chars that you sure will not be result from combining your values, for example pipe char or tab or as in below example I choose word 'delimiter' assuming it will not appear as a result of concatenation
#standardSQL
SELECT id,
ARRAY(SELECT part FROM UNNEST(parts) part WHERE part != '') agg_values_A_segment
FROM (
SELECT id,
SPLIT(STRING_AGG(IF(segment = 'A', value, 'delimiter'), ''), 'delimiter') parts
FROM `project.dataset.table`
GROUP BY id
)
You can test, play with above using same sample data:
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 'A' segment, '3' value, DATETIME '2019-01-07T18:46:21' date_column UNION ALL
SELECT 1, 'A', '2', '2019-01-07T18:46:22' UNION ALL
SELECT 1, 'A', 'x', '2019-01-07T18:46:23' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:24' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:25' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:26' UNION ALL
SELECT 1, 'A', '7', '2019-01-07T18:46:27' UNION ALL
SELECT 1, 'A', '3', '2019-01-07T18:46:28' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:29' UNION ALL
SELECT 1, 'A', '9', '2019-01-07T18:46:30' UNION ALL
SELECT 1, 'A', '9', '2019-01-07T18:46:31' UNION ALL
SELECT 2, 'A', '3', '2019-01-07T18:46:32' UNION ALL
SELECT 2, 'B', '3', '2019-01-07T18:46:33' UNION ALL
SELECT 2, 'A', '3', '2019-01-07T18:46:34' UNION ALL
SELECT 2, 'A', 'm', '2019-01-07T18:46:35'
)
SELECT id,
ARRAY(SELECT part FROM UNNEST(parts) part WHERE part != '') agg_values_A_segment
FROM (
SELECT id,
SPLIT(STRING_AGG(IF(segment = 'A', value, 'delimiter'), ''), 'delimiter') parts
FROM `project.dataset.table`
GROUP BY id
)
-- ORDER BY id
obviously with same result
Row id agg_values_A_segment
1 1 32x
73
99
2 2 3
3m
note: second option can result with resources exceeded for case when you have too many rows per id - you just need to try it on your real data
SQL tables represent unordered sets. This is particularly true in a parallel, columnar database such as BigQuery. The rest of this answer assumes you have a column that specifies the ordering of the rows.
This is a gaps-and-islands problem. You can use the difference of row_number() to identify the adjacent groups . . . and then aggregation:
select id, array_agg(vals order by min_ordercol)
from (select id, segment, string_agg(value delimiter '' order by date_column) as vals,
min(<ordercol>) as min_ordercol
from (select t.*,
row_number() over (partition by id order by date_column) as seqnum,
row_number() over (partition by id, segment order by date_column) as seqnum_2,
from t
) t
group by id, segment, (seqnum - seqnum_2)
) x
group by id;
Related
I have the following dataset in BigQuery: Dataset
When the type is V, count is always equal to zero.
When the type is V, I would like the column count to get the first value under this row with type T.
The rows are ordered according to group_id and position column.
This is the final result I would like to have: Desired dataset
I tried this
FIRST_VALUE( count )
OVER (
PARTITION BY id_group,id_person
ORDER BY
CASE WHEN type LIKE "T" THEN 1 ELSE 0 END DESC,
position
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) as NEW_count
but it always gives me the first count with a type T, when I want the first value below the row.
I don't think this scenario can be solved using navigation functions, since the closest T value position is not deterministic (lead, lag, first, last, nth value).
You need to query the same dataset to find the closest t_value using subqueries.
Here you have a working example:
WITH source_data AS (
SELECT 'A1' AS id_group, 'a' AS id_perso, 1 AS position, 'V' AS type, 0 AS count
UNION ALL
SELECT 'A1', 'b', 2, 'V', 0
UNION ALL
SELECT 'A1', 'c', 3, 'T', 13
UNION ALL
SELECT 'A1', 'd', 4, 'V', 0
UNION ALL
SELECT 'A1', 'e', 5, 'T', 5
UNION ALL
SELECT 'A1', 'f', 6, 'T', 7
UNION ALL
SELECT 'A1', 'g', 7, 'V', 0
UNION ALL
SELECT 'A1', 'h', 8, 'V', 0
UNION ALL
SELECT 'A1', 'i', 9, 'V', 0
UNION ALL
SELECT 'A1', 'j', 10,' 'T, 0
)
SELECT *,
(SELECT count FROM source_data counts WHERE counts.position =
(SELECT MIN(t_values.position) FROM source_data t_values WHERE t_values.type='T' and t_values.position > source.position))
FROM source_data source
You can coalesce the t_value if you need 0s instead of nulls
You can consider the below query for your requirement.
with cte as (
select 'A1' id_group, 'a' id_person, 1 position, 'V'type, 0 count union all
select 'A1','b',2,'V',0 union all
select 'A1','c',3,'T',13 union all
select 'A1','d',4,'V',0 union all
select 'A1','e',5,'T',5 union all
select 'A1','f',6,'T',7 union all
select 'A1','g',7,'V',0 union all
select 'A1','h',8,'V',0 union all
select 'A1','i',9,'V',0 union all
select 'A1','j',10,'T',0
)
select *,last_value(count_1 ignore nulls) over (order by position desc) new_count,
from (select *,case when type='V' and count=0 then null else count
end count_1
from cte
)
order by position
please help me
How can I find this records with id and status, e.g. "d", but only one that has passed through the previous status in the past, e.g. "b", but not other way
From Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row comparisons:
SELECT id, status, timestamp
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY id
ORDER BY timestamp
ALL ROWS PER MATCH
PATTERN ( {- b_status any_row*? -} d_status )
DEFINE
b_status AS status = 'b',
d_status AS status = 'd'
)
You can also, in earlier versions, use analytic functions:
SELECT id, status, timestamp
FROM (
SELECT t.*,
COUNT(CASE WHEN status = 'b' THEN 1 END)
OVER (PARTITION BY id ORDER BY timestamp) AS has_b
FROM table_name t
)
WHERE status = 'd'
AND has_b > 0;
Which, for the sample data:
CREATE TABLE table_name (id, status, timestamp) AS
SELECT 100, 'a', DATE '2022-02-01' FROM DUAL UNION ALL
SELECT 100, 'b', DATE '2022-02-02' FROM DUAL UNION ALL
SELECT 100, 'c', DATE '2022-02-03' FROM DUAL UNION ALL
SELECT 100, 'd', DATE '2022-02-04' FROM DUAL UNION ALL
SELECT 200, 'g', DATE '2022-02-05' FROM DUAL UNION ALL
SELECT 200, 's', DATE '2022-02-06' FROM DUAL UNION ALL
SELECT 200, 'd', DATE '2022-02-07' FROM DUAL UNION ALL
SELECT 200, 'a', DATE '2022-02-08' FROM DUAL;
Both output:
ID
STATUS
TIMESTAMP
100
d
2022-02-04 00:00:00
db<>fiddle here
You could also use this solution using EXISTS clause.
select t1.ID, t1.status, t1.timestamp
from Your_Table t1
where t1.status = 'd'
and exists (
select null
from Your_Table t2
where t1.id = t2.id
and t1.timestamp > t2.timestamp
and t2.status in ('b')
)
;
demo on db<>fiddle
In my database that represents a car service station, I am trying to figure out a SQL query that would give me a total average of how much does the customer pays for a single service but instead of getting AVG() of the price on all existing Invoices, I want to group the invoices by the same reservation_id. After that, I would like to get the total average of all of those grouped results.
I am using the two tables listed in the picture below. I want to get the value of a total average price by applying AVG() on all averages that are made by grouping prices by the same FK Reservation_reservation_id.
I tried to make this into a single query but I failed so I came looking for help from more experienced users. Also, I need to select (get) only the result of the total average. This result should give me an overview of how much each customer pays on average for one reservation.
Thanks for your time
You appear to want to aggregate twice:
SELECT AVG( avg_price ) avg_avg_price
FROM (
SELECT AVG( price ) AS avg_price
FROM invoice
GROUP BY reservation_reservation_id
)
Which, for the sample data:
CREATE TABLE invoice ( reservation_reservation_id, price ) AS
SELECT 1, 10 FROM DUAL UNION ALL
SELECT 1, 12 FROM DUAL UNION ALL
SELECT 1, 14 FROM DUAL UNION ALL
SELECT 1, 16 FROM DUAL UNION ALL
SELECT 2, 10 FROM DUAL UNION ALL
SELECT 2, 11 FROM DUAL UNION ALL
SELECT 2, 12 FROM DUAL;
Outputs:
AVG_AVG_PRICE
12
db<>fiddle here
If you want this per customer:
SELECT customer_customer_id, AVG(avg_reservation_price)
FROM (SELECT i.customer_customer_id, i.reservation_reservation_id,
AVG(i.price) as avg_reservation_price
FROM invoice i
GROUP BY i.customer_customer_id, i.reservation_reservation_id
) ir
GROUP BY customer_customer_id;
If you want this for a particular "checkout reason" -- which is the closest that I imagine that "service" means -- then join in the reservations table and filter:
SELECT customer_customer_id, AVG(avg_reservation_price)
FROM (SELECT i.customer_customer_id, i.reservation_reservation_id,
AVG(i.price) as avg_reservation_price
FROM invoice i JOIN
reservation r
ON i.reservation_reservation_id = r.reservation_id
WHERE r.checkup_type = ?
GROUP BY i.customer_customer_id, i.reservation_reservation_id
) ir
GROUP BY customer_customer_id;
You might want to try the below:
with aux (gr, subgr, val) as (
select 'a', 'a1', 1 from dual union all
select 'a', 'a2', 2 from dual union all
select 'a', 'a3', 3 from dual union all
select 'a', 'a4', 4 from dual union all
select 'b', 'b1', 5 from dual union all
select 'b', 'b2', 6 from dual union all
select 'b', 'b3', 7 from dual union all
select 'b', 'b4', 8 from dual)
SELECT
gr,
avg(val) average_gr,
avg(avg(val)) over () average_total
FROM
aux
group by gr;
Which, applied to your table, would result in:
SELECT
reservation_id,
avg(price) average_rn,
avg(avg(price)) over () average_total
FROM
invoices
group by reservation_id;
This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Closed last year.
I'm using the query:
select
SGB_ID,
max(SGB_TERM_CODE_EFF)max_term,
SGB_TYP_CODE
from SGB
group by
SGB_ID,
SGB_TYP_CODE
order by 1
I'm getting multiple rows, as the SGB_TYP_CODE has different values. I just want the result from the maximum term. I've tried using 'keep dense_rank', but I can't get it to work.
Thanks.
Here is how to do that with MAX()...KEEP():
SELECT sgb_id,
MAX (sgb_term_code_eff) max_term,
MAX (sgb_typ_code)
KEEP ( DENSE_RANK FIRST
ORDER BY sgb_term_code_eff DESC ) sgb_typ_code
FROM sgb
GROUP BY sgb_id
ORDER BY 1
Full example:
with sgb ( sgb_id, sgb_term_code_eff, sgb_typ_code ) AS
( SELECT 1, 'A', 'ACODE' FROM DUAL UNION ALL
SELECT 1, 'B', 'BCODE' FROM DUAL UNION ALL
SELECT 1, 'Z', 'ZCODE' FROM DUAL UNION ALL
SELECT 1, 'D', 'DCODE' FROM DUAL UNION ALL
SELECT 2, 'A', 'ACODE' FROM DUAL UNION ALL
SELECT 2, 'Q', 'QCODE' FROM DUAL UNION ALL
SELECT 2, 'Q', 'QCODE' FROM DUAL UNION ALL
SELECT 3, 'A', 'ACODE' FROM DUAL )
SELECT sgb_id,
MAX (sgb_term_code_eff) max_term,
MAX (sgb_typ_code) KEEP ( DENSE_RANK FIRST ORDER BY sgb_term_code_eff DESC ) sgb_typ_code
FROM sgb
GROUP BY sgb_id
ORDER BY 1
SGB_ID MAX_TERM SGB_TYP_CODE
-------------------------------------- -------- ------------
1 Z ZCODE
2 Q QCODE
3 A ACODE
Say I have a table with columns: id, group_id, type, val
Some example data from the select:
1, 1, 'budget', 100
2, 1, 'budget adjustment', 10
3, 2, 'budget', 500
4, 2, 'budget adjustment', 30
I want the result to look like
1, 1, 'budget', 100
2, 1, 'budget adjustment', 10
5, 1, 'budget total', 110
3, 2, 'budget', 500
4, 2, 'budget adjustment', 30
6, 2, 'budget total', 530
Please advise,
Thanks.
This will get the you two added lines desired, but not the values for ID and type that you want.
Oracle examples: http://docs.oracle.com/cd/B19306_01/server.102/b14223/aggreg.htm
Select id, group_id, type as myType, sum(val) as sumVal
FROM Table name
Group by Grouping sets ((id, group_id, type, val), (group_ID))
As #Serpiton suggested, it seems the functionality you're really looking for is the ability to add sub-totals to your result set, which indicates that rollup is what you need. The usage would be something like this:
SELECT id,
group_id,
coalesce(type, 'budget total') as type,
sum(val) as val
FROM your_table
GROUP BY ROLLUP (group_id), id, type
You can using union all to add more row to original select.
select group_id,type,val from tableA
union all
select group_id, 'budget total' as type,sum(val) as val from tableA group by group_id,type
To show right order and id you can using nested select
select rownum, group_id,type,val from (select group_id,type,val from tableA
union all
select group_id, 'budget total' as type,sum(val) as val from tableA group by group_id,type) order by group_id asc
with foo as
(select 1 group_id, 'budget' type, 100 val
from dual
union
select 1, 'budget adjustment', 10
from dual
union
select 2, 'budget', 500
from dual
union
select 2, 'budget adjustment', 30
from dual)
SELECT rank() over(order by type, group_id) rk,
group_id,
nvl(type, 'budget total') as type,
sum(val) as val
FROM foo
group by Grouping sets((group_id, type, val),(group_id))
its just the continuation of xQbert post to have id values!