I am trying to identify session-level traffic sources in GA4 data within BigQuery.
I initially used the traffic_source variables, until I realised they relate to the traffic source that first acquired the user:
i.e. if I searched for Apple on Google then my traffic_source would be organic. if I then click on an email link from Apple my traffic_source remains organic (not email).
I can spot traffic from marketing channels with utm tags from the session start event, but am now trying to find traffic from channels that don't use utm tags: direct, organic search and referral. If the page_location does not have utm tags and the event_params contains a populated field called referral, then I know where the traffic came from.
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS url,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_referrer') AS referrer,
REPLACE(REGEXP_EXTRACT((SELECT value.string_value FROM UNNEST(event_params) WHERE
key = 'page_location'), 'utm_source=([^&]+)'), '%20', ' ') AS utm_source,
REPLACE(REGEXP_EXTRACT((SELECT value.string_value FROM UNNEST(event_params) WHERE
key = 'page_location'), 'utm_campaign=([^&]+)'), '%20', ' ') AS utm_campaign,
REPLACE(REGEXP_EXTRACT((SELECT value.string_value FROM UNNEST(event_params) WHERE
key = 'page_location'), 'utm_content=([^&]+)'), '%20', ' ') AS utm_content,
REPLACE(REGEXP_EXTRACT((SELECT value.string_value FROM UNNEST(event_params) WHERE
key = 'page_location'), 'utm_term=([^&]+)'), '%20', ' ') AS utm_term,
CASE
WHEN
REGEXP_CONTAINS((SELECT value.string_value FROM UNNEST(event_params) WHERE
key = 'page_location'), 'utm_') IS FALSE
AND (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_referrer') IS NOT NULL
THEN 'organic'
AND utm_medium = (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_referrer')
ELSE REPLACE(REGEXP_EXTRACT((SELECT value.string_value FROM UNNEST(event_params) WHERE
key = 'page_location'), 'utm_medium=([^&]+)'), '%20', ' ')
END AS utm_medium
FROM `bigquery***.analytics_***.events_***`
WHERE event_name = 'session_start'
so in my query I am extracting the utm values from the page location and placing them in new columns.
I for organic search (which doesn't use utm tags) I am using a CASE statement. I want to set utm_medium as 'organic' AND utm_source as the referral site. Essentially trying to do this:
CASE
WHEN A = 1 THEN
B = 'apple' AND C = 'fruit'
END
Is this possible in BigQuery?
A CASE statement is an expression for creating a single column. The simplest thing to do is to simply apply the same expression twice:
...
CASE WHEN a = 1 THEN 'apple' ELSE ... END AS b,
CASE WHEN a = 1 THEN 'fruit' ELSE ... END AS c,
...
Alternatively, you could complicate things and use a STRUCT type column:
...
CASE WHEN a = 1 THEN STRUCT('apple' AS B, 'fruit' AS C) ELSE ... END AS new_struct,
...
This accomplishes your goal in one step and allows you to do things like SELECT new_struct.B, new_struct.C in future.
Related
i have this query
insert into changes (id_registro)
select d2.id_registro
from daily2 d2
where exists (
select 1
from daily d1
where
d1.id_registro = d2.id_registro
and (d2.origen, d2.sector, d2.entidad_um, d2.sexo, d2.entidad_nac, d2.entidad_res,
d2.municipio_res, d2.tipo_paciente,d2.fecha_ingreso, d2.fecha_sintomas,
d2.fecha_def, d2.intubado, d2.neumonia, d2.edad, d2.nacionalidad, d2.embarazo,
d2.habla_lengua_indig, d2.diabetes, d2.epoc, d2.asma, d2.inmusupr, d2.hipertension,
d2.otra_com, d2.cardiovascular, d2.obesidad,
d2.renal_cronica, d2.tabaquismo, d2.otro_caso, d2.resultado, d2.migrante,
d2.pais_nacionalidad, d2.pais_origen, d2.uci )
<>
(d1.origen, d1.sector, d1.entidad_um, d1.sexo, d1.entidad_nac, d1.entidad_res,
d1.municipio_res, d1.tipo_paciente, d1.fecha_ingreso, d1.fecha_sintomas,
d1.fecha_def, d1.intubado, d1.neumonia, d1.edad, d1.nacionalidad, d1.embarazo,
d1.habla_lengua_indig, d1.diabetes, d1.epoc, d1.asma, d1.inmusupr, d1.hipertension,
d1.otra_com, d1.cardiovascular, d1.obesidad,
d1.renal_cronica, d1.tabaquismo, d1.otro_caso, d1.resultado, d1.migrante,
d1.pais_nacionalidad, d1.pais_origen, d1.uci ))
it results in an insersion data that doesn't exist in another table, that's fine. but i want know exactly which field has changed to store it in a log table
You don't mention precisely what you expect to see in your output but basically to accomplish what you're after you'll need a long sequence of CASE clauses, one for each column
e.g. one approach might be to create a comma-separated list of the column names that have changed:
INSERT INTO changes (id_registro, column_diffs)
SELECT d2.id_registro,
CONCAT(
CASE WHEN d1.origen <> d2.origen THEN 'Origen,' ELSE '' END,
CASE WHEN d1.sector <> d2.sector THEN 'Sector,' ELSE '' END,
etc.
Within the THEN part of the CASE you can build whatever detail you want to show
e.g. a string showing before and after values of the columns CONCAT('Origen: Was==> ', d1.origen, ' Now==>', d2.origen). Presumably though you'll also need to record the times of these changes if there can be multiple updates to the same record throughout the day.
Essentially you'll need to decide what information you want to show in your logfile, but based on your example query you should have all the information you need.
I'm trying to assign a status based on the number of IDs using a metric. This is the query I've written (and it works):
select
x.yyyy_mm_dd,
x.prov_id,
x.app,
x.metric,
x.is_100,
case
when ((x.is_100 = 'true') or size(collect_set(x.list)) >10) then 'implemented'
when ((x.is_100 = 'false') and size(collect_set(x.list)) between 1 and 10) then 'first contact'
else 'no contact'
end as impl_status,
size(collect_set(x.list)) as array_size,
collect_set(x.list) as list
from(
select
yyyy_mm_dd,
prov_id,
app,
metric,
is_100,
list
from
my_table
lateral view explode(ids) e as list
) x
group by
1,2,3,4,5
However, the impl_status is incorrect for the second condition in the case statement. In the result set, I can see rows with is_100 = false, array_size between 1 and 10, however the impl_status ends up being 'no contact' instead of 'first contact'. I was thinking maybe between isn't inclusive but it seems to be according to the docs.
I am curious if this works:
(case when x.is_100 or count(distinct x.list) > 10
then 'implemented'
when (not x.is_100) and count(x.list) > 0
then 'first contact'
else 'no contact'
end) as impl_status,
This should be the same logic without the string comparisons -- here is an interesting viewpoint on booleans in Hive. I also think that COUNT() is clearer than the array functionality.
Be sure you have not some hidden space in the string
when (( trim(x.is_100) = 'false') and size(collect_set(x.list)) between 1 and 10) then 'first contact'
I have this code:
CASE WHEN url LIKE 'utm_medium'
THEN
SPLIT_PART( -- slice UTM from URL
SPLIT_PART(pageviews.url,'utm_medium=',2)
,'&',1
)
ELSE NULL END AS utm_medium,
CASE
WHEN utm_medium = 'paidsocial'
THEN channel = 'Paid Social'
WHEN utm_medium = 'email'
THEN channel = 'Email'
END
In the first CASE, I extract utm_medium param from URL as utm_medium column, and in second CASE I'd like to create another column channel based on utm_medium value.
I'm getting error:
column "utm_medium" does not exist
LINE 153: WHEN utm_medium = 'paidsocial'
Is it possible to query utm_medium column just after it is created?
split_part() suggests Postgres, which supports lateral joins. These allow you to define aliases in the FROM clause without using subqueries or CTEs. That would be:
SELECT pv.*, v.utm_medium,
(CASE WHEN utm_medium = 'paidsocial' THEN channel = 'Paid Social'
WHEN utm_medium = 'email' THEN channel = 'Email'
END)
FROM pageviews pv CROSS JOIN LATERAL
(VALUES (CASE WHEN url LIKE 'utm_medium'
THEN SPLIT_PART(SPLIT_PART(pv.url, 'utm_medium=', 2
), '&', 1
)
)
) v(utm_medium)
Is it possible to query utm_medium column just after it is created?
Most likely not.
Some engines may support this non standard SQL feature you want. You don't mention which database you are using, so it's not clear.
The general solution for this is to produce a bona fide column with the name you want using a table expression or a CTE.
For example:
select
col1,
col2,
CASE
WHEN utm_medium = 'paidsocial'
THEN channel = 'Paid Social'
WHEN utm_medium = 'email'
THEN channel = 'Email'
END
from (
select
col1,
col2,
CASE WHEN url LIKE 'utm_medium'
THEN
SPLIT_PART( -- slice UTM from URL
SPLIT_PART(pageviews.url,'utm_medium=',2)
,'&',1
)
ELSE NULL
END AS utm_medium
from my_table
) x
This example will run on virtually all databases. As you see a table expression x was used in the FROM clause, effectively defining a column called utm_medium in it. Then, the main query can simply use this column right away.
We receive auto-generated emails from an application, and we export those to our database as they arrive at the Inbox. The table is called dbo.MailArchive.
Up until recently, the body of the email has always looked like this...
Status: Completed
Successful actions count: 250
Page load count: 250
...except with different numbers and statuses. Note that there is a carriage return on the blank line after Page load count.
The entirety of this data gets written to a field called Mail_Body - then we run the following statement using OPENJSON to parse those lines into their own columns in the record:
DECLARE #PI varchar(7) = '%[^' + CHAR(13) + CHAR(10) + ']%';
SELECT j.Status,
j.Successful_Actions_Count,
j.Page_Load_Count
FROM dbo.MailArchive m
CROSS APPLY(VALUES(REVERSE(m.Mail_Body),PATINDEX(#PI,REVERSE(m.Mail_Body)))) PI(SY,I)
CROSS APPLY(VALUES(REVERSE(STUFF(PI.SY,1,PI.I,''))))S(FixedString)
CROSS APPLY OPENJSON (CONCAT('{"', REPLACE(REPLACE(S.FixedString, ': ', '":"'), CHAR(13) + CHAR(10), '","'), '"}'))
WITH (Status varchar(100) '$.Status',
Successful_Actions_Count int '$."Successful actions count"',
Page_Load_Count int '$."Page load count"') j;
Beginning today, there are certain emails where the body of the email looks like this:
Agent did not meet defined success criteria on this run.
Status: Completed
Successful actions count: 250
Page load count: 250
To clarify, that's one new line at the top, a carriage return at the end of that line, and a carriage return on the blank line between the new line and the Status line. At this time, there is no consistent way to predict which emails will come in with this new line, and which ones won't.
How can I modify our OPENJSON statement to say, If this first line exists in the body, skip/ignore it and parse lines 3 through 5, else just do exactly what I have above? Or perhaps even better to future-proof it, always ignore everything before the word Status?
Since your data has new leading and trailing rows, I think a simple aggregation in concert with a string_split() and a CROSS APPLY would be more effective than my previous XML answer and the current JSON approach
Example or dbFiddle
Select A.ID
,Status = stuff(Pos1,1,charindex(':',Pos1),'')
,Action = try_convert(int,stuff(Pos2,1,charindex(':',Pos2),''))
,PageCnt = try_convert(int,stuff(Pos3,1,charindex(':',Pos3),''))
From YourTable A
Cross Apply (
Select [Pos1] = max(case when Value like 'Status:%' then value end)
,[Pos2] = max(case when Value like '%actions count:%' then value end)
,[Pos3] = max(case when Value like 'Page load count:%' then value end)
From string_split(SomeCol,char(10))
) B
Returns
ID Status Action PageCnt
1 Completed 250 250
Note: Use an OUTER APPLY if you want to see NULLs
select fti.pa_serial_,fti.homeownerm_name,fti.ward_,fti.villagetole,fti.status,
ftrq.date_reporting, ftrq.name_of_recorder_reporting,
case
when fti.status='terminate' then ftrq.is_the_site_cleared ='1' end as is_the_site_cleared from fti join ftrq on ftrq.fulcrum_parent_id = fti.fulcrum_id
Here, is_the_site_cleared is text type of column which is converted into boolean by the when statement written and hence does not print as 1 and takes as true. I explicitly used print '1'. But this also did not work. My aim is to display '1' in the column 'is_the_site_cleared' when the value of fti.status='terminate'. Please help!!!
How about using integers rather than booleans?
select fti.pa_serial_, fti.homeownerm_name, fti.ward_,
fti.villagetole, fti.status, ftrq.date_reporting,
ftrq.name_of_recorder_reporting,
(case when fti.status = 'terminate' -- and ftrq.is_the_site_cleared = '1'
then 1 else 0
end) as is_the_site_cleared
from fti join
ftrq
on ftrq.fulcrum_parent_id = fti.fulcrum_id ;
From the description, I cannot tell if you want to include the condition ftrq.is_the_site_cleared = '1' in the when condition. But the idea is to have the then and else return numbers if that is what you want to see.