Error with BigQuery syntax when I want to use Group By - google-bigquery

I have this query:
SELECT
SPLIT(jsonPayload.message, ",")[offset(1)] as visa,
SPLIT(jsonPayload.message, ",")[offset(3)] as action,
timestamp as time_acton
FROM `my_dataset.rstudio_logs.AUDIT_LOGS_20230109`
WHERE jsonPayload.message like '%session_file%'
And I got this table:
+----------+-----------------------+--------------------------------+
| visa | action | time_acton |
+----------+-----------------------+--------------------------------+
| "VISA_01"| session_file_upload | 2023-01-09 16:27:30.375298 UTC |
+----------+-----------------------+--------------------------------+
| "VISA_01"| session_file_download | 2023-01-09 16:33:13.650860 UTC |
+----------+-----------------------+--------------------------------+
| "VISA_02"| session_file_download | 2023-01-09 16:33:27.902632 UTC |
+----------+-----------------------+--------------------------------+
| "VISA_01"| session_file_download | 2023-01-09 16:33:27.903459 UTC |
+----------+-----------------------+--------------------------------+
| "VISA_02"| session_file_download | 2023-01-09 16:33:27.902632 UTC |
+----------+-----------------------+--------------------------------+
| "VISA_01"| session_file_download | 2023-01-09 16:33:27.903459 UTC |
+----------+-----------------------+--------------------------------+
But I want to group all the lines by the visa column like this :
+----------+-----------------------+--------------------------------+
| visa | action | time_acton |
+----------+-----------------------+--------------------------------+
| "VISA_01"| session_file_upload | 2023-01-09 16:27:30.375298 UTC |
+ +-----------------------+--------------------------------+
| | session_file_download | 2023-01-09 16:33:13.650860 UTC |
+ +-----------------------+--------------------------------+
| | session_file_download | 2023-01-09 16:33:27.903459 UTC |
+ +-----------------------+--------------------------------+
| | session_file_download | 2023-01-09 16:33:27.903459 UTC |
+----------+-----------------------+--------------------------------+
| "VISA_02"| session_file_download | 2023-01-09 16:33:27.902632 UTC |
+ +-----------------------+--------------------------------+
| | session_file_download | 2023-01-09 16:33:27.902632 UTC |
+----------+-----------------------+--------------------------------+
So I tried to group by like this:
SELECT
SPLIT(jsonPayload.message, ",")[offset(1)] as visa,
SPLIT(jsonPayload.message, ",")[offset(3)] as action,
timestamp as time_acton
FROM `ops-center-axe-dev-9561.rstudio_vms_logs.WORKBENCH_SESSION_AUDIT_LOGS_20230109`
WHERE jsonPayload.message like '%session_file%'
GROUP BY visa
But I got this error:
SELECT list expression references jsonPayload.message which is neither grouped nor aggregated
Someone can help me, please?

There seems to be a misunderstanding here about what a GROUP BY accomplishes. When you group by, you group rows together generally with the purpose of aggregating the data or reducing duplicates.
What it seems from your example is that you are looking just to order your data by visa and time_acton.
You can replace the Group by with ORDER BY visa, time_acton

Related

Retrieve unknown values based on non-schema specific sequence of column values

I want to return and operate on time values based on their related event values, but only if a specific sequence of events occurs. A simplified example table below:
+--------+------------+-------+-------------+-------+-------------+-------+-------------+-------+-------------+-------+
| id | event1 | time1 | event2 | time2 | event3 | time3 | event4 | time4 | event5 | time5 |
+--------+------------+-------+-------------+-------+-------------+-------+-------------+-------+-------------+-------+
| abc123 | firstevent | 10:00 | secondevent | 10:01 | thirdevent | 10:02 | fourthevent | 10:03 | fifthevent | 10:04 |
| abc123 | thirdevent | 10:10 | secondevent | 10:11 | thirdevent | 10:12 | firstevent | 10:13 | secondevent | 10:14 |
| def456 | thirdevent | 10:20 | firstevent | 10:21 | secondevent | 10:22 | thirdevent | 10:24 | fifthevent | 10:25 |
+--------+------------+-------+-------------+-------+-------------+-------+-------------+-------+-------------+-------+
For this table we want to retrieve the times whenever this particular sequence of events occurs: firstevent, secondevent, thirdevent, and a final event of any non-zero value. Meaning the relevant entries returned would be the following:
+--------+------------+-------+-------------+-------+-------------+-------+-------------+-------+------------+-------+
| id | event1 | time1 | event2 | time2 | event3 | time3 | event4 | time4 | event5 | time5 |
+--------+------------+-------+-------------+-------+-------------+-------+-------------+-------+------------+-------+
| abc123 | firstevent | 10:00 | secondevent | 10:01 | thirdevent | 10:02 | fourthevent | 10:03 | null | null |
| null | null | null | null | null | null | null | null | null | null | null |
| def456 | null | null | firstevent | 10:21 | secondevent | 10:22 | thirdevent | 10:24 | fifthevent | 10:26 |
+--------+------------+-------+-------------+-------+-------------+-------+-------------+-------+------------+-------+
As shown above the columns are irrelevant to the occurrence of the sequence, with two results being returned starting in both the event1 and event2 columns, thus the solution should be independent and support n number of columns. These values can then be aggregated by the final non-zero event that occurs in the sequence after the 3 fixed variables to give something like the following:
+-------------+-------------------------------+
| FinalEvent | AverageTimeBetweenFinalEvents |
+-------------+-------------------------------+
| fourthevent | 1:00 |
| fifthevent | 2:00 |
+-------------+-------------------------------+
Below is for BigQuery Standard SQL
#standardSQL
WITH search_events AS (
SELECT ['firstevent', 'secondevent', 'thirdevent'] search
), temp AS (
SELECT *, REGEXP_EXTRACT(events, CONCAT(search, r',(\w*)')) FinalEvent
FROM (
SELECT id, [time1, time2, time3, time4, time5] times,
(SELECT STRING_AGG(event) FROM UNNEST([event1, event2, event3, event4, event5]) event) events,
(SELECT STRING_AGG(search) FROM UNNEST(search) search) search
FROM `project.dataset.table`, search_events
)
)
SELECT FinalEvent,
times[SAFE_OFFSET(ARRAY_LENGTH(REGEXP_EXTRACT_ALL(REGEXP_EXTRACT(events, CONCAT(r'(.*?)', search, ',', FinalEvent )), ',')) + 3)] time
FROM temp
WHERE IFNULL(FinalEvent, '') != ''
If to apply to sample data from your question - result is
Row FinalEvent time
1 fourthevent 10:03
2 fifthevent 10:25
So, as you can see - all final events are extracted along with their respective times
Now, you can do here whatever analytics you need - I was not sure about logic behind AverageTimeBetweenFinalEvents, so I am leaving this to you - especially that I think that main focus of the question was extraction of those final events
would you be able to provide the logic behind this statement please?
times[SAFE_OFFSET(ARRAY_LENGTH(REGEXP_EXTRACT_ALL(REGEXP_EXTRACT(events, CONCAT(r'(.*?)', search, ',', FinalEvent )), ',')) + 3)] time
Sure, hope below helps to get a logic behind that expression
assemble regular expression to extract list of events happened before matched events
extract those events
extract all commas into array
calculate position of final event by taking number of commas in above array + 3 (three is to reflect number of positions in search sequence)
extract respective time as an element of times array

How to skip null values in PostgreSQL query?

I have a query in database table name "c_hw_day" in postgresql
select pr.c_period_id,
unnest(array_agg_mult(array[hd.wd1,hd.wd2,hd.wd3,hd.wd4,hd.wd5,hd.wd6,hd.wd7,hd.wd8,hd.wd9,hd.wd10,hd.hd1,hd.hd2,hd.hd2,hd.hd3,hd.hd4,hd.hd5,hd.hd6,hd.hd7,hd.hd8,hd.hd9,hd.hd10])) as wd_hd
from c_hw_day hd
left join c_period pr on (hd.c_period_id = pr.c_period_id)
group by 1
result like
| ID | Weekend |
----------+-----------------------
| 1000051 | 2018-11-30 00:00:00 |
| 1000051 | |
| 1000051 | |
| 1000051 | 2018-12-07 00:00:00 |
| 1000051 | |
| 1000051 | |
| 1000051 | |
| 1000051 | 2018-12-14 00:00:00 |
I want to skip the null value like
| ID | Weekend |
----------+-----------------------
| 1000051 | 2018-11-30 00:00:00 |
| 1000051 | 2018-12-07 00:00:00 |
| 1000051 | 2018-12-14 00:00:00 |
I would would not do this using arrays. I would just use a lateral join:
select pr.c_period_id, wd_hd
from c_hw_day hd left join
c_period pr
on hd.c_period_id = pr.c_period_id lateral join
(values (hd.wd1, hd.wd2, hd.wd3, hd.wd4, hd.wd5, hd.wd6, hd.wd7, hd.wd8, hd.wd9, hd.wd10, hd.hd1, hd.hd2, hd.hd2, hd.hd3, hd.hd4, hd.hd5, hd.hd6, hd.hd7, hd.hd8, hd.hd9, hd.hd10
) v(hd)
where hd is not null;
This logic is much clearer. Without the outer group by, I suspect it is faster as well.
the most lasiest way - put your query into subquery
if you don't have a lot of data will be ok
select * from (
select pr.c_period_id,
unnest(array_agg_mult(array[hd.wd1,hd.wd2,hd.wd3,hd.wd4,hd.wd5,hd.wd6,hd.wd7,hd.wd8,hd.wd9,hd.wd10,hd.hd1,hd.hd2,hd.hd2,hd.hd3,hd.hd4,hd.hd5,hd.hd6,hd.hd7,hd.hd8,hd.hd9,hd.hd10])) as wd_hd
from c_hw_day hd
left join c_period pr on (hd.c_period_id = pr.c_period_id)
group by 1
)q1
where wd_hd is not null

Subtraction between two dates in different lines SQL ACCESS

I'm trying to find the difference between two dates that are in different columns and rows.
In this command I make a difference on the same line:
SELECT
FACTRY.finish_datetime,
FACTRY.start_datetime,
DateDiff("n",[finish_datetime],[start_datetime]) AS date_diff
FROM FACTRY
WHERE (((FACTRY.job_number)='30'));
The output:
+---------------------+---------------------+-----------+
| start_date_time | finish_date_time | date_diff |
+---------------------+---------------------+-----------+
| 17/08/2016 20:24:00 | 17/08/2016 20:25:00 | -1 |
| 17/08/2016 20:25:00 | 17/08/2016 21:00:00 | -35 |
| 17/08/2016 21:00:00 | 17/08/2016 21:01:00 | -1 |
| 17/08/2016 21:01:00 | 17/08/2016 21:02:00 | -1 |
+---------------------+---------------------+-----------+
In Oracle the following SCRIPT works.
SELECT
start_date,
finish_date,
LEAD(finish_date, 1) OVER (ORDER BY finish_date) AS NextFinish
FROM FACTRY
WHERE job_number = 30;
But as in ACCESS these functions are not available anyone has any idea how to do?

How to select all columns of a bigquery table

I have the follow bigquery table:
+---------------------+-----------+-------------------------+-----------------+
| links.href | links.rel | dados.dataHora | dados.sequencia |
+---------------------+-----------+-------------------------+-----------------+
| https://www.url.com | self | 2017-03-16 16:27:10 UTC | 2 |
| | | 2017-03-16 16:35:34 UTC | 1 |
| | | 2017-03-16 19:50:32 UTC | 3 |
+---------------------+-----------+-------------------------+-----------------+
and I want select all rows. So, I try the follow query:
SELECT * FROM [my_project:a_import.my_table] LIMIT 100
But, I have a bad (and sad) error:
Error: Cannot output multiple independently repeated fields at the same time. Found links_rel and dados_dataHora
Please, can anybody help me?

SQL query to get most recent row use asp.net+access

select and get or retrieve a date from calendar control in ASP.Net to the table and select time from checkbox to the table.
output :: The most recent date and time is .....
table TEST
Date_From_Calendar | TIME |
---------------------|--------------|
15/12/2014 | 09.00-12.00 |
18/12/2014 | 15.00-18.00 |
18/12/2014 | 15.00-18.00 |
19/12/2014 | 15.00-18.00 |
19/12/2014 | 12.00-15.00 |
19/12/2014 | 12.00-15.00 |
19/12/2014 | 12.00-15.00 |
19/12/2014 | 09.00-12.00 |
20/12/2014 | 09.00-12.00 |
24/12/2014 | 09.00-12.00 |
SELECT Date_From_Calendar , MAX(TIME) AS TIME
FROM Table
GROUP BY Date_From_Calendar