Below is a row from a SnowFlake query. This is the only row in the result with this information (i.e., this row is unique).
ID ACCOUNT_NUMBER DATE_1 DATE_2
123 347 2017-10-19 2017-10-29
I ran a GROUP BY like below to count the number of rows in each group. I got 3 for the above row. Shouldn't I get 1?
SELECT DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, COUNT(*)
FROM TABLE GROUP BY 1, 2, 3, 4;
ID ACCOUNT_NUMBER DATE_1 DATE_2 COUNT
123 347 2017-10-19 2017-10-29 3
I expected to see count of 1 for this row, but I got 3.
The result is correct. The DISTINCT is applied after the grouping and has not effect in provided query.
Docs
Typically, a SELECT statement’s clauses are evaluated in the order shown below:
From
Where
Group by
Having
Window
QUALIFY
Distinct
Order by
Limit
Both below queries produces the same result:
SELECT DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, COUNT(*)
FROM TAB
GROUP BY 1, 2, 3, 4;
SELECT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, COUNT(*)
FROM TAB
GROUP BY 1, 2, 3, 4;
To apply DISTINCT it should be provided before grouping(subquery)
SELECT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, COUNT(*)
FROM (SELECT DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2 FROM TAB)
GROUP BY 1, 2, 3, 4;
or as a part of aggregate function:
SELECT ID, ACCOUNT_NUMBER, DATE_1, DATE_2,
COUNT(DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2)
FROM TAB
GROUP BY 1, 2, 3, 4;
For sample data:
CREATE OR REPLACE TABLE TAB(ID INT,
ACCOUNT_NUMBER INT,
DATE_1 TEXT,
DATE_2 TEXT)
AS
SELECT 123, 347, '2017-10-19', '2017-10-29' UNION ALL
SELECT 123, 347, '2017-10-19', '2017-10-29' UNION ALL
SELECT 123, 347, '2017-10-19', '2017-10-29';
In my database that represents a car service station, I am trying to figure out a SQL query that would give me a total average of how much does the customer pays for a single service but instead of getting AVG() of the price on all existing Invoices, I want to group the invoices by the same reservation_id. After that, I would like to get the total average of all of those grouped results.
I am using the two tables listed in the picture below. I want to get the value of a total average price by applying AVG() on all averages that are made by grouping prices by the same FK Reservation_reservation_id.
I tried to make this into a single query but I failed so I came looking for help from more experienced users. Also, I need to select (get) only the result of the total average. This result should give me an overview of how much each customer pays on average for one reservation.
Thanks for your time
You appear to want to aggregate twice:
SELECT AVG( avg_price ) avg_avg_price
FROM (
SELECT AVG( price ) AS avg_price
FROM invoice
GROUP BY reservation_reservation_id
)
Which, for the sample data:
CREATE TABLE invoice ( reservation_reservation_id, price ) AS
SELECT 1, 10 FROM DUAL UNION ALL
SELECT 1, 12 FROM DUAL UNION ALL
SELECT 1, 14 FROM DUAL UNION ALL
SELECT 1, 16 FROM DUAL UNION ALL
SELECT 2, 10 FROM DUAL UNION ALL
SELECT 2, 11 FROM DUAL UNION ALL
SELECT 2, 12 FROM DUAL;
Outputs:
AVG_AVG_PRICE
12
db<>fiddle here
If you want this per customer:
SELECT customer_customer_id, AVG(avg_reservation_price)
FROM (SELECT i.customer_customer_id, i.reservation_reservation_id,
AVG(i.price) as avg_reservation_price
FROM invoice i
GROUP BY i.customer_customer_id, i.reservation_reservation_id
) ir
GROUP BY customer_customer_id;
If you want this for a particular "checkout reason" -- which is the closest that I imagine that "service" means -- then join in the reservations table and filter:
SELECT customer_customer_id, AVG(avg_reservation_price)
FROM (SELECT i.customer_customer_id, i.reservation_reservation_id,
AVG(i.price) as avg_reservation_price
FROM invoice i JOIN
reservation r
ON i.reservation_reservation_id = r.reservation_id
WHERE r.checkup_type = ?
GROUP BY i.customer_customer_id, i.reservation_reservation_id
) ir
GROUP BY customer_customer_id;
You might want to try the below:
with aux (gr, subgr, val) as (
select 'a', 'a1', 1 from dual union all
select 'a', 'a2', 2 from dual union all
select 'a', 'a3', 3 from dual union all
select 'a', 'a4', 4 from dual union all
select 'b', 'b1', 5 from dual union all
select 'b', 'b2', 6 from dual union all
select 'b', 'b3', 7 from dual union all
select 'b', 'b4', 8 from dual)
SELECT
gr,
avg(val) average_gr,
avg(avg(val)) over () average_total
FROM
aux
group by gr;
Which, applied to your table, would result in:
SELECT
reservation_id,
avg(price) average_rn,
avg(avg(price)) over () average_total
FROM
invoices
group by reservation_id;
I have a table like this:
now;
I want map row value to column with SELECT statement.
result should be this:
You can do something like this.
Select * from (select id, person_id, f_device, report_date, originaldate)
pivot
(
count(*)
for person_id in (1, 2, 3, 4)
);
That is you need to aggregate and you will get columns that are in the select statement and for clause. This query will count rows for person_id in (1, 2, 3, 4) and group by the remaining columns internally.
Since your question is not clear if this is not the case please try to improve your question.
Update
WITH cte as ( select personid id, reportdate c_date, listagg(originaldate||' ') og FROM my_table GROUP BY personid, reportdate),
cte2 as (SELECT id, c_date, regexp_substr(og, '(.*?)( |$)', 1, 1 ) one,
regexp_substr(og, '(.*?)( |$)', 1, 2) two,
regexp_substr(og, '(.*?)( |$)', 1, 3 ) three,
regexp_substr(og, '(.*?)( |$)', 1, 4 ) four
from cte)
SELECT * FROM cte2;
If you want a distinct personid each row then reportdate needs to be aggregated
WITH cte as ( select personid id, max(reportdate ) c_date, listagg(originaldate||' ') WITHIN GROUP(ORDER BY personid ) og FROM my_table GROUP BY personid),
cte2 as (SELECT id, c_date, regexp_substr(og, '(.*?)( |$)', 1, 1 ) one,
regexp_substr(og, '(.*?)( |$)', 1, 2) two,
regexp_substr(og, '(.*?)( |$)', 1, 3 ) three,
regexp_substr(og, '(.*?)( |$)', 1, 4 ) four
from cte)
SELECT * FROM cte2;
I need to aggregate consecutive values in a table with BigQuery, as shown in the example
Segment can be only 'A' or 'B'. Value is a String.
Basically, for each id i need to consider only segment='A' taking into account the gaps.
It should be ORDER BY date_column ASC
Example
id, segment, value, date_column
1, A, 3, daytime
1, A, 2, daytime
1, A, x, daytime
1, B, 3, daytime
1, B, 3, daytime
1, B, 3, daytime
1, A, 7, daytime
1, A, 3, daytime
1, B, 3, daytime
1, A, 9, daytime
1, A, 9, daytime
2, A, 3, daytime
2, B, 3, daytime
2, A, 3, daytime
2, A, m, daytime
Expected result
id, agg_values_A_segment
1, ['32x', '73', '99']
2, ['3', '3m']
How can I achieve this result?
I'm struggling with the 'gap' between the segments.
Below options for BigQuery Standard SQL
Option 1 - using window analytics functions
#standardSQL
SELECT id, ARRAY_AGG(values_in_group ORDER BY grp) agg_values_A_segment
FROM (
SELECT id, grp, STRING_AGG(value, '' ORDER BY date_column) values_in_group
FROM (
SELECT id, segment, value, date_column, flag,
COUNTIF(flag) OVER(PARTITION BY id ORDER BY date_column) grp
FROM (
SELECT *, IFNULL(LAG(segment) OVER(PARTITION BY id ORDER BY date_column), segment) != segment flag
FROM `project.dataset.table`
)
)
WHERE segment = 'A'
GROUP BY id, grp
)
GROUP BY id
You can test, play with above using sample data from your question as in below example:
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 'A' segment, '3' value, DATETIME '2019-01-07T18:46:21' date_column UNION ALL
SELECT 1, 'A', '2', '2019-01-07T18:46:22' UNION ALL
SELECT 1, 'A', 'x', '2019-01-07T18:46:23' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:24' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:25' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:26' UNION ALL
SELECT 1, 'A', '7', '2019-01-07T18:46:27' UNION ALL
SELECT 1, 'A', '3', '2019-01-07T18:46:28' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:29' UNION ALL
SELECT 1, 'A', '9', '2019-01-07T18:46:30' UNION ALL
SELECT 1, 'A', '9', '2019-01-07T18:46:31' UNION ALL
SELECT 2, 'A', '3', '2019-01-07T18:46:32' UNION ALL
SELECT 2, 'B', '3', '2019-01-07T18:46:33' UNION ALL
SELECT 2, 'A', '3', '2019-01-07T18:46:34' UNION ALL
SELECT 2, 'A', 'm', '2019-01-07T18:46:35'
)
SELECT id, ARRAY_AGG(values_in_group ORDER BY grp) agg_values_A_segment
FROM (
SELECT id, grp, STRING_AGG(value, '' ORDER BY date_column) values_in_group
FROM (
SELECT id, segment, value, date_column, flag,
COUNTIF(flag) OVER(PARTITION BY id ORDER BY date_column) grp
FROM (
SELECT *, IFNULL(LAG(segment) OVER(PARTITION BY id ORDER BY date_column), segment) != segment flag
FROM `project.dataset.table`
)
)
WHERE segment = 'A'
GROUP BY id, grp
)
GROUP BY id
-- ORDER BY id
with result
Row id agg_values_A_segment
1 1 32x
73
99
2 2 3
3m
Option 2 - above option should work for big volumes of rows per id, but looks a little heavy - so second option is more of simple option but assumes you have some character or sequence of chars that you sure will not be result from combining your values, for example pipe char or tab or as in below example I choose word 'delimiter' assuming it will not appear as a result of concatenation
#standardSQL
SELECT id,
ARRAY(SELECT part FROM UNNEST(parts) part WHERE part != '') agg_values_A_segment
FROM (
SELECT id,
SPLIT(STRING_AGG(IF(segment = 'A', value, 'delimiter'), ''), 'delimiter') parts
FROM `project.dataset.table`
GROUP BY id
)
You can test, play with above using same sample data:
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 'A' segment, '3' value, DATETIME '2019-01-07T18:46:21' date_column UNION ALL
SELECT 1, 'A', '2', '2019-01-07T18:46:22' UNION ALL
SELECT 1, 'A', 'x', '2019-01-07T18:46:23' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:24' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:25' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:26' UNION ALL
SELECT 1, 'A', '7', '2019-01-07T18:46:27' UNION ALL
SELECT 1, 'A', '3', '2019-01-07T18:46:28' UNION ALL
SELECT 1, 'B', '3', '2019-01-07T18:46:29' UNION ALL
SELECT 1, 'A', '9', '2019-01-07T18:46:30' UNION ALL
SELECT 1, 'A', '9', '2019-01-07T18:46:31' UNION ALL
SELECT 2, 'A', '3', '2019-01-07T18:46:32' UNION ALL
SELECT 2, 'B', '3', '2019-01-07T18:46:33' UNION ALL
SELECT 2, 'A', '3', '2019-01-07T18:46:34' UNION ALL
SELECT 2, 'A', 'm', '2019-01-07T18:46:35'
)
SELECT id,
ARRAY(SELECT part FROM UNNEST(parts) part WHERE part != '') agg_values_A_segment
FROM (
SELECT id,
SPLIT(STRING_AGG(IF(segment = 'A', value, 'delimiter'), ''), 'delimiter') parts
FROM `project.dataset.table`
GROUP BY id
)
-- ORDER BY id
obviously with same result
Row id agg_values_A_segment
1 1 32x
73
99
2 2 3
3m
note: second option can result with resources exceeded for case when you have too many rows per id - you just need to try it on your real data
SQL tables represent unordered sets. This is particularly true in a parallel, columnar database such as BigQuery. The rest of this answer assumes you have a column that specifies the ordering of the rows.
This is a gaps-and-islands problem. You can use the difference of row_number() to identify the adjacent groups . . . and then aggregation:
select id, array_agg(vals order by min_ordercol)
from (select id, segment, string_agg(value delimiter '' order by date_column) as vals,
min(<ordercol>) as min_ordercol
from (select t.*,
row_number() over (partition by id order by date_column) as seqnum,
row_number() over (partition by id, segment order by date_column) as seqnum_2,
from t
) t
group by id, segment, (seqnum - seqnum_2)
) x
group by id;