Trying to get a daily total of events and event params - google-bigquery

I'm trying to get my data into a table such as this for a rolling 30 day time frame
Event Date
Event Name
Campaign
Medium
Source
Term
Sr_Posting_ID
Date 1
Event 1
Value
Value
Value
Value
Date 1
Event 2
Value
Value
Value
Value
Date 1
Event 3
Value
Value
Value
Value
Date 2
Event 1
Value
Value
Value
Value
Date 2
Event 2
Value
Value
Value
Value
Date 2
Event 3
Value
Value
Value
Value
--------
--------
Basically an event count for every event, by day, showing the campaign, medium, etc that are associated with each event.
Campaign, Source, Medium, etc are all event_param.key and each one has a value
Here's a picture of my data: https://imgur.com/a/j5sf8ok
What I've tried:
SELECT *
FROM `nth-glider-369017.analytics_316822874.events_*`, UNNEST(event_params) as param
WHERE event_name IN ("sr_job_application_started", "sr_job_application_continued", "sr_job_completed_application")
AND param.key IN ("term", "campaign", "source", "medium", "engaged_session_count")
AND _table_suffix BETWEEN format_date('%Y%m%d',date_sub(current_date(), interval 30 day))
AND format_date('%Y%m%d',date_sub(current_date(), interval 1 day));
However 2 problems:
It's not pulling every day, only 2
I'm not getting term, campaign, etc in their own columns with their respective values
What exactly do I need to do to fix this?

Related

Group by arbitrary interval

I have a column that is of type timestamp. I would like to dynamically group the results by random period time (it can be 10 seconds or even 5 hours).
Supposing, I have this kind of data:
Image
If the user provides 2 hours and wants to get the max value of the air_pressure, I would like to have the first row combined with the second one. The result should look like this:
date | max air_pressure
2022-11-22 00:00:00:000 | 978.81666667
2022-11-22 02:00:00:000 | 978.53
2022-11-22 04:00:00:000 | 987.23333333
and so on. As I mentioned, the period must be easy to change, because maybe he wants to group by days/seconds...
The functionality should work like function date_trunc(). But that can only group by minutes/seconds/hours, while I would like to group for arbitrary intervals.
Basically:
SELECT g.start_time, max(air_pressure) AS max_air_pressure
FROM generate_series($start
, $end
, interval '15 min') g(start_time)
LEFT JOIN tbl t ON t.date_id >= g.start_time
AND t.date_id < g.start_time + interval '15 min' -- same interval
GROUP BY 1
ORDER BY 1;
$start and $end are timestamps delimiting your time frame of interest.
Returns all time slots, and NULL for max_air_pressure if no matching entries are found for the time slot.
See:
Best way to count rows by arbitrary time intervals
Aside: "date_id" is an unfortunate column name for a timestamp.

GROUP BY date and empty data

I have table hits with columns created and user_id.
I want get stats hits count for last 30 days, GROUP BY day. But I have problem, because some days user dont have traffic.
And as a result, I do not see this day in the report.
How to get data for every day (with 0 hits), even where there is no hits?
My query:
SELECT user_id, toDate(created) as date, COUNT() as count
FROM hits
WHERE created > NOW() - INTERVAL 30 DAY
GROUP BY toDate(created), user_id

Time looping an average

I have a table with 17,000 records that is ordered by time spaced in 15 minute intervals. The time values loop back onto themselves every 24 hours, so for example, I could have 100 records that are all at 1 AM, just on different days. I want to create a 'average day' by taking those 100 records at 1 am and finding the average of them for the averaged 1 am.
I don't know how to format the table to make it show up nicely here.
I'm assuming you want to calculate the average value per time interval regardless of the day in a query. You could use this SQL to group your table by Time interval only (assuming that it's separate from the date field), and average whichever fields you want to average. Do not select or group by the date field, just select and group by the time field.
SELECT TimeField
, AVG([Field1ToAverage])
, AVG([Field2ToAverage])
FROM MyTable
GROUP BY TimeField;
If the date and time fields are stored together in the same column, you will have to extract the time only:
SELECT TimeValue([DateTimeField])
, AVG([Field1ToAverage])
, AVG([Field2ToAverage])
FROM MyTable
GROUP BY TimeValue([DateTimeField]);

Check if a time period is included in another time period

In my db i have many record, with start date and length of that time of period.
For example
id start_date lenght
1 2013-01-01 00:00:00 20
2 2013-02-30 00:00:00 10
3 2013-01-20 00:00:00 3
So i can easily get the end date.
Now if the user gave me any period of time, how can I control if that period is included in one of the time period that I have in the db?
Thank you.
You can get the list using a where clause and the date functions:
select *
from t
where XXX between start_date and date_add(start_date, interval length day);
EDIT:
The above is for one date. If the user gives two date, XXX and YYY, then this is what you want for any overlap:
select *
from t
where XXX <= date_add(start_date, interval length day) and
YYY >= start_date;
That is, the period the user gives you starts before the end of the interval and the period ends after the start of the interval.

Oracle SQL Query to check valid date range for an activity?

I have a table called as Activity with columns like activity_id and activity_date.
Consider a data in activity table like below,
activity_id activity_date
1 1st June
2 1st July
3 1st August
4 1st September
5 1st October
Now I want to change the date of the activity 3, but I can not change the date to less than 1st July or more than 1st September as there are already some other activities on those dates.
The only valid dates for activity 3 are between 2nd July to 30th August.
Similarly, for activity 1, valid new date can be any date before 1st July.
Similarly, for activity 5, valid new date ranges from 2nd September to any date in future as its last activity.
I need to give the validation message to the user in front end if the new date is not within the range.
Input to the query will be activity id and the new activity date.
Below is the DDL script
CREATE TABLE "HEADCOUNT"."ACTIVITY"
( "ACTIVITY_ID" NUMBER(*,0) NOT NULL,
"ACTIVITY_DATE" DATE
);
Insert into "HEADCOUNT"."ACTIVITY" (ACTIVITY_ID,ACTIVITY_DATE) values (1,'01-06-2012');
Insert into "HEADCOUNT"."ACTIVITY" (ACTIVITY_ID,ACTIVITY_DATE) values (2,'01-07-2012');
Insert into "HEADCOUNT"."ACTIVITY" (ACTIVITY_ID,ACTIVITY_DATE) values (3,'01-08-2012');
Insert into "HEADCOUNT"."ACTIVITY" (ACTIVITY_ID,ACTIVITY_DATE) values (4,'01-09-2012');
Insert into "HEADCOUNT"."ACTIVITY" (ACTIVITY_ID,ACTIVITY_DATE) values (5,'01-10-2012');
This will find the date ranges for each row:
SELECT activity_id, activity_date
,NVL( LAG(activity_date) OVER(ORDER BY activity_id)
,TO_DATE('1900-01-01', 'YYYY-MM-DD')
) AS previous_date
,NVL( LEAD(activity_date) OVER(ORDER BY activity_id)
,TO_DATE('2100-01-01', 'YYYY-MM-DD')
) AS next_date
FROM activity
ORDER BY activity_id
Result:
ACTIVITY_ID ACTIVITY_DATE PREVIOUS_DATE NEXT_DATE
---------------------------------- ------------- ------------- ---------
1 01-JUN-12 01-JAN-00 01-JUL-12
2 01-JUL-12 01-JUN-12 01-AUG-12
3 01-AUG-12 01-JUL-12 01-SEP-12
4 01-SEP-12 01-AUG-12 01-OCT-12
5 01-OCT-12 01-SEP-12 01-JAN-00
Validation would then be for a given id:
"input date" > previous_date AND "input date" < next_date
Date ranges are based on previous and following records when ordered by activity_id. Perhaps the ordering should really be by activity_date, though. Using LAG and LEAD will allow for gaps in activity_ids as well.
Find date limits using a query like below (replace #param_id with changing activity id):
SELECT activity_id, activity_date
FROM activity
WHERE activity_id = #param_id-1
OR activity_id = #param_id+1
This query will return at most two results, but for first and last activities only one will be returned. So, you should read results in the front-end and decide what to do:
Specify beginning limit: Result with id #param_id-1 specifies beginning date limit. If no result with this id, this is first activity and no limit for begin date.
Specify ending limit: Result with id #param_id+1 specifies ending date limit. If no result with this id, this is last activity and no limit for end date.
Do or warn: If new date is within the range, perform change. Otherwise warn the user.