SQL query with logic - sql

Please help me with SQL query.
My table structure with data is:
How can select all rows without duplicates Event_ID?
And Event_NAME predominantly must contain http://
But if does not exist Event_NAME with http://, then Event_NAME must contain Connect
Finally selection result assumed
Syntax Oracle.
Thank all in advance for help.

If I understand correctly, you want to select all rows with Event_NAME containing 'http://', and then select any events not in the first set that have 'Connect' in Event_NAME.
I'm assuming that the only possibilities are the ones you show above - either there's two entries (http and Connect) for an event, or there's just 'Connect' - though this query could work for other situations.
The query is a union between 1. all events with 'http://' in the Event_NAME, and 2. events that don't have an Event_ID in the first set and that have 'Connect' in their Event_NAME.
There are probably prettier ways to do this, but it works in Oracle with the test data:
SELECT * FROM eventtest WHERE Event_NAME LIKE 'http://%'
UNION
SELECT * FROM eventtest
WHERE Event_ID NOT IN
(SELECT Event_ID FROM eventtest WHERE Event_NAME LIKE 'http://%');

here are 2 approaches requiring only a single pass of the data:
SELECT
Event_ID
, MAX(Event_NAME) AS Event_NAME
FROM eventtest
GROUP BY
Event_ID
;
SELECT
ID
, Event_ID
, Event_NAME
FROM (
SELECT
ID
, Event_ID
, Event_NAME
, ROW_NUMBER() OVER (PARTITION BY Event_ID ORDER BY Event_NAME DESC) AS rn
FROM eventtest
) dt
WHERE rn = 1
;

Related

Grouping by Session ID in BigQuery for GA4 data

I'm currently trying to build a query that allows me to group all my GA4 event data by session ID in order to get information about the all the events, per session, as opposed to analyzing the data by each event separately.
The resulting output of my initial query is a new table that has session ID as its own column in the table, instead of being within an array for event parameters for a particular event.
The problem is that the session_id column has non-unique values, a session id is repeated multiple times for each row that is a new event (that happens within that session). I am trying to combine (merge) those non-unique session ids so that I can get ALL the events associated with a particular session_id.
I have tried this query which provides me with session_id as a new column, that is repeated for each event.
`SELECT
*,
(
SELECT COALESCE(value.int_value, value.float_value, value.double_value)
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'
) AS session_id,
(
SELECT COALESCE(value.string_value)
FROM UNNEST(event_params)
WHERE key = 'page_location'
) AS page_location
FROM
`digital-marketing-xxxxxx.analytics_xxxxxxx.events_intraday*``
gives me an output like (it has way more columns than this but just an example):
session_id
event_name
1234567
session_start
1234567
click_url
I need a way to basically merge the two session ids into a single cell. When I try this:
SELECT
*,
(
SELECT COALESCE(value.int_value, value.float_value, value.double_value)
FROM UNNEST(event_params)
WHERE key = 'ga_session_id'
) AS session_id,
(
SELECT COALESCE(value.string_value)
FROM UNNEST(event_params)
WHERE key = 'page_location'
) AS page_location
FROM
`digital-marketing-xxxxxxx.analytics_xxxxxxx.events_intraday*`
GROUP BY session_id
I get an error that tells me (if I understand correctly) that I can't aggregate certain values (like date) which is what the code is trying to do when attempting to group by session id.
Is there any way around this? I'm new to SQL but the searches I've done do far haven't given me a clear answer on how to attempt this.
I use this code to understand sequence of events, it might not be that efficient as I have it set up to look at other things as well
with _latest as (
SELECT
--create unique id
concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) as unique_session_id,
--create event id
concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id'),event_name) as session_ids,
event_name,
event_date,
TIMESTAMP_MICROS(event_timestamp) AS event_timestamp
FROM *******
where
-- change the date range by using static and/or dynamic dates
_table_suffix between '20221113' and '20221114'),
Exit_count as (
select *,
row_number() over (partition by session_ids order by event_timestamp desc) as Event_order
from _latest)
select
Event_order,
unique_session_id,
event_date,
event_name,
FROM
Exit_count
group by
Event_order,
event_name,
unique_session_id,
--pagepath,
event_date
--Country_site
order by
unique_session_id,
Event_order

Oracle sum 1 to count by partition

I'd like to, based on the first 6 columns, calculate the desired count. For each partition of user_id, session_id and orig_id, ordered by rank_agse ascending, I'd like to, starting in 1, add one every time the lag_agse column equals 'ACCESSED'. Please find it populated to illustrate what I would want in the table below.
It seems to me that you are looking for
select user_id, session_id, orig_id, type, lag_agse, rank_agse,
count(case when type = 'ACCESSED' then 1 end)
over (partition by user_id, session_id, orig_id
order by rank_agse) as desired_count
from your_table
order by user_id, session_id, orig_id, rank_agse desc
;
See my Comment under your question regarding ascending vs descending order by RANK_AGSE.
Note that count() does the same job as summing over 1 when type is 'ACCESSED' and 0 otherwise - and it does the same job in a simpler way.

SQL - timeline based queries

I have a table of events which has:
user_id
event_name
event_time
There are event names of types: meeting_started, meeting_ended, email_sent
I want to create a query that counts the number of times an email has been send during a meeting.
UPDATE: I'm using Google BigQuery.
Example query:
SELECT
event_name,
count(distinct user_id) users,
FROM
events_table WHERE
and event_name IN ('meeting_started', 'meeting_ended')
group by 1
How can I achieve that?
Thanks!
You can do this in BigQuery using last_value():
Presumably, an email is send during a meeting if the most recent "meeting" event is 'meeting_started'. So, you can solve this by getting the most recent meeting event for each event and then filtering:
select et.*
from (select et.*,
last_value(case when event_name in ('meeting_started', 'meeting_ended') then event_name end) ignore nulls) over
(partition by user_id order by event_time) as last_meeting_event
from events_table et
) et
where event_name = 'email_sent' and last_meeting_event = 'meeting_started'
This reads likes some kind of gaps-and-islands problem, where an island is a meeting, and you want emails that belong to islands.
How do we define an island? Assuming that meeting starts and ends properly interleave, we can just compare the count of starts and ends on a per-user basis. If there are more starts than ends, then a meeting is in progress. Using this logic, you can get all emails that were sent during a meeting like so:
select *
from (
select e.*,
countif(event_name = 'meeting_started') over(partition by user_id order by event_time) as cnt_started,
countif(event_name = 'meeting_ended' ) over(partition by user_id order by event_time) as cnt_ended
from events_table e
) e
where event_name = 'email_sent' and cnt_started > cnt_ended
It is unclear where you want to go from here. If you want the count of such emails, just use select count(*) instead of select * in the outer query.

Select specific value from a column in SQL

I am new to sql and I would like to select specific value form a column, count them and group them by another value; the code I am using is :
SELECT COUNT (DISTINCT event_name), client_package_id
FROM userevent
WHERE event_happened_at >= from_iso8601_timestamp('2020-10-19T11:44:24Z')
GROUP BY client_package_id
I am expecting 3 columns, one "client_package_id" other "event_name" which is grouped by specific values in the column and the other one is the quantity of column 2.
Here is the example :
If I'm reading it correctly, you need to also group by event_name, and include it in the SELECT.
eg
SELECT client_package_id, event_name, COUNT (*)
FROM userevent
WHERE event_happened_at >= from_iso8601_timestamp('2020-10-19T11:44:24Z')
GROUP BY client_package_id, event_name
If that doesn't return what you're after, please provide more details - ie sample data and expected result.
SELECT client_package_id, event_name, COUNT (event_name)
FROM userevent
WHERE event_happened_at >= from_iso8601_timestamp('2020-10-19T11:44:24Z')
GROUP BY client_package_id, event_name
SELECT client_package_id, event_name, COUNT(event_name) as 'event_quantity'
FROM userevent
WHERE event_happened_at >= from_iso8601_timestamp('2020-10-19T11:44:24Z')
GROUP BY client_package_id, event_name

Max of a Date field into another field in Postgresql

I have a postgresql table wherein I have few fields such as id and date. I need to find the max date for that id and show the same into a new field for all the ids. SQLFiddle site was not responding so I have an example in the excel. Here is the screenshot of the data and the output for the table.
You could use the windowing variant of max:
SELECT id, date, MAX(date) OVER (PARTITION BY id)
FROM mytable
Something like this might work:
WITH maxdts AS (
SELECT id, max(dt) maxdt FROM table GROUP BY id
)
SELECT id, date, maxdt FROM table t, maxdts m WHERE t.id = m.id;
Keep in mind without more information that this could be a horribly inefficient query, but it will get you what you need.