I have several similar CTE, actually 9. The difference is in the WHERE clause from the subquery on the column for.
WITH my_cte_1 AS (
SELECT id,
"time",
LEAD("time",1) OVER (
PARTITION BY id
ORDER BY id,"time"
) next_time
FROM history
where id IN (SELECT id FROM req WHERE type = 'sup' AND for = 1)
),
WITH my_cte_2 AS (
SELECT id,
"time",
LEAD("time",1) OVER (
PARTITION BY id
ORDER BY id,"time"
) next_time
FROM history
where id IN (SELECT id FROM req WHERE type = 'sup' AND for = 2)
),
WITH my_cte_3 AS (
SELECT id,
"time",
LEAD("time",1) OVER (
PARTITION BY id
ORDER BY id,"time"
) next_time
FROM history
where id IN (SELECT id FROM req WHERE type = 'sup' AND for = 3)
)
SELECT
'History' AS "Indic",
(SELECT count(DISTINCT(id)) FROM my_cte_1 ) AS "cte1",
(SELECT count(DISTINCT(id)) FROM my_cte_2 ) AS "cte2",
(SELECT count(DISTINCT(id)) FROM my_cte_3 ) AS "cte3",
My database is read only so I can't use function.
Each CTE process a large record of data.
Is there a way, where I can setup a parameter for the column for or a workaround ?
I'm assuming a little bit here, but I would think something like this would work:
with cte as (
SELECT
h.id, h."time",
LEAD(h."time",1) OVER (PARTITION BY h.id ORDER BY h.id, h."time") next_time,
r.for
FROM
history h
join req r on
r.type = 'sup' and
h.id = r.id and
r.for between 1 and 3
)
select
'History' AS "Indic",
count (distinct id) filter (where for = 1) as cte1,
count (distinct id) filter (where for = 2) as cte2,
count (distinct id) filter (where for = 3) as cte3
from cte
This would avoid multiple passes on the various tables and should run much quicker unless these are highly selective values.
Another note... the "lead" analytic function doesn't appear to be used. If this is really all there is to your query, you can omit that and make it run a lot faster. I left it in assuming it had some other purpose.
Here is my query so far:
SELECT
event_date,
event_timestamp,
user_pseudo_id,
geo.country,
geo.region,
geo.city,
geo.sub_continent,
(
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
key = "search_term") AS search_term,
FROM `GoogleAnlyticsDatabase`
I am trying to exclude all NULL values in the 'search_term' column.
I am struggling to identify where I need to include IS NOT NULL in my code.
Everything I have tried so far has thrown up errors.
Does anyone have any ideas?
Is this query giving you the expected result besides the NULL problem?
If yes, you can just wrap your query to an CTE aand filter this CTE
like
WITH
SRC AS (
SELECT
event_date,
event_timestamp,
user_pseudo_id,
geo.country,
geo.region,
geo.city,
geo.sub_continent,
(
SELECT
value.string_value
FROM
UNNEST(event_params)
WHERE
key = "search_term") AS search_term
FROM `GoogleAnlyticsDatabase`
)
SELECT * FROM SRC WHERE search_term IS NOT NULL
My data looks like this:
For a particular "event_params.key" = "ga_session_id", I want to find the sum of "event_params.value.int_value" when "event_params.key" = "engagement_time_msec".
This is to be done for every user (column - "user_pseudo_id").
The "engagement_time_msec" is present in only "event_name" = "screen_view" and "user_engagement" and can come multiple times for one particular "ga_session_id".
Basically, "ga_session_id" is the unique id for every session a "user_pseudo_id" creates. I want to find the average session duration for the users.
Please help me.
Your question is really hard to follow. You seem to want something like the sum of engagement_time_msec:
select t.user_pseudo_id,
sum(case when ep.key = 'engagement_time_msec' then ep.value.int_val end)
from t left join
unnest(event_params) ep
on 1=1
group by 1;
I have no idea what the other keys are for. This appears to have the data you want.
You can use UNNEST to extract data from array of key-value pairs in BigQuery:
WITH unnested_table AS (
SELECT
user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') as ga_session_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') as engagement_time_msec
FROM myDataset.myTable
WHERE event_name in ('screen_view', 'user_engagement')
),
session_duration_table AS (
SELECT
user_pseudo_id,
ga_session_id,
SUM(engagement_time_msec) as session_duration
FROM unnested_table
GROUP BY user_pseudo_id, ga_session_id
)
SELECT
user_pseudo_id,
AVG(session_duration) as avg_session_duration
FROM session_duration_table
GROUP BY user_pseudo_id
Below is for BigQuery Standard SQL
#standardSQL
select user_pseudo_id, avg(session_time_msec) as avg_session_time_msec
from (
select user_pseudo_id,
(select value.int_value from e.event_params where key = 'ga_session_id') as ga_session_id,
sum((select value.int_value from e.event_params where key = 'engagement_time_msec')) as session_time_msec
from `project.dataset.table` e
group by 1, 2
)
group by 1
My table looks like this:
There's one more column named "user_pseudo_id" which is unique id for users. I want to take sum of event_params.key = 'engagement_time_msec' for user_pseudo_id who have done event_name = 'yt_event'.
Also, event_params.key = 'engagement_time_msec' is only present in two events only, i.e. event_name = 'user_engagement' and 'screen_view'.
I have tried subqueries like this:
SELECT
user_pseudo_id,
(
select
sum(value.int_value/60000)
from unnest(event_params)
where
key = 'engagement_time_msec'
and
user_pseudo_id = 'yt_users') as eng_time_min
FROM
`Xyz.events_20201030`
where
event_name = 'yt_event'
But I am not able to get it.
Please help me. I will be highly obliged.
Thanks.
Below is for BigQuery Standard SQL
#standardSQL
select user_pseudo_id,
sum(( select value.int_value/60000
from t.event_params
where key = 'engagement_time_msec'
)) as eng_time_min
from `Xyz.events_20201030` t
where user_pseudo_id in (
select distinct user_pseudo_id
from `Xyz.events_20201030`
where event_name = 'yt_event'
)
group by user_pseudo_id
Hmmm . . . you can use aggregation and a having clause:
select user_pseudo_id,
sum(case when key ep.key = 'engagement_time_msec' then ep.int_val / 60000) as etm
from `Xyz.events_20201030` e cross join
unnest(event_params) ep
where ep.key in ( 'engagement_time_msec')
group by 1
having countif(event_name = 'yt_event') > 0;
Can someone help me out to tune this query? It's taking 1 minute time to return the data in sqldeveloper.
SELECT
masterid, notification_id, notification_list, typeid,
subject, created_at, created_by, approver, sequence_no,
productid, statusid, updated_by, updated_at, product_list,
notification_status, template, notification_type, classification
FROM
(
SELECT
masterid, notification_id, notification_list, typeid, subject,
approver, created_at, created_by, sequence_no, productid,
statusid, updated_by, updated_at, product_list, notification_status,
template, notification_type, classification,
ROW_NUMBER() OVER(ORDER BY masterid DESC)AS r
FROM
(
SELECT DISTINCT
a.masterid AS masterid,
a.maxid AS notification_id,
notification_list,
typeid,
noti.subject AS subject,
noti.approver AS approver,
noti.created_at AS created_at,
noti.created_by AS created_by,
noti.sequence_no AS sequence_no,
a.productid AS productid,
a.statusid AS statusid,
noti.updated_by AS updated_by,
noti.updated_at AS updated_at,
(
SELECT LISTAGG(p.name,',') WITHIN GROUP(ORDER BY p.id) AS list_noti
FROM product p
INNER JOIN notification_product np ON np.product_id = p.id
WHERE notification_id = a.maxid
) AS product_list,
(
SELECT description
FROM notification_status
WHERE id = a.statusid
) AS notification_status,
(
SELECT name
FROM template
WHERE id = a.templateid
) AS template,
(
SELECT description
FROM notification_type
WHERE id = a.typeid
) AS notification_type,
(
SELECT tc.description
FROM template_classification tc
INNER JOIN notification nt ON tc.id = nt.classification_id
WHERE nt.id = a.maxid
) AS classification
FROM
(
SELECT
nm.id AS masterid,
nm.product_id AS productid,
nm.notification_status_id AS statusid,
nm.template_id AS templateid,
nm.notification_type_id AS typeid,
(
SELECT MAX(id)
FROM notification
WHERE notification_master_id = nm.id
) AS maxid,
(
SELECT LISTAGG(n.id,',') WITHIN GROUP(ORDER BY nf.id) AS list_noti
FROM notification n
WHERE notification_master_id = nm.id
) AS notification_list
FROM notification_master nm
INNER JOIN notification nf ON nm.id = nf.notification_master_id
WHERE nm.disable = 'N'
ORDER BY nm.id DESC
) a
INNER JOIN notification noti
ON a.maxid = noti.id
AND
(
(
(
TO_DATE('01-jan-1970','dd-MM-YYYY') +
numtodsinterval(created_at / 1000,'SECOND')
) <
(current_date + INTERVAL '-21' DAY)
)
OR (typeid exists(2,4) AND statusid = 4)
)
)
)
WHERE r BETWEEN 11 AND 20
DISTINCT is very often an indicator for a badly written query. A normalized database doesn't contain duplicate data, so where do the duplicates suddenly come from that you must remove with DISTINCT? Very often it is your own query producing these. Avoid producing duplicates in the first place, so you don't need DISTINCT later.
In your case you are joining with the table notification in your subquery a, but you are not using its rows in that subquery; you only select from notification_master_id.
After all, you want to get notification masters, get their latest related notification (by getting its ID first and then select the row). You don't need hundreds of subqueries to achieve this.
Some side notes:
To get the description from template_classification you are joining again with the notification table, which is not necessary.
ORDER BY in a subquery (ORDER BY nm.id DESC) is superfluous, because subquery results are per standard SQL unsorted. (Oracle violates this standard sometimes in order to apply ROWNUM on the result, but you are not using ROWNUM in your query.)
It's a pity that you store created_at not as a DATE or TIMESTAMP, but as a number. This forces you to calculate. I don't think this has a great impact on your query, though, because you are using it in an OR condition.
CURRENT_DATE gets you the client date. This is rarely wanted, as you select data from the database, which should of course not relate to some client's date, but to its own date SYSDATE.
If I am not mistaken, your query can be shortened to:
SELECT
nm.id AS masterid,
nf.id AS notification_id,
nfagg.notification_list AS notification_list,
nm.notification_type_id AS typeid,
nf.subject AS subject,
nf.approver AS approver,
nf.created_at AS created_at,
nf.created_by AS created_by,
nf.sequence_no AS sequence_no,
nm.product_id AS productid,
nm.notification_status_id AS statusid,
nf.updated_by AS updated_by,
nf.updated_at AS updated_at,
(
SELECT LISTAGG(p.name, ',') WITHIN GROUP (ORDER BY p.id)
FROM product p
INNER JOIN notification_product np ON np.product_id = p.id
WHERE np.notification_id = nf.id
) AS product_list,
(
SELECT description
FROM notification_status
WHERE id = nm.notification_status_id
) AS notification_status,
(
SELECT name
FROM template
WHERE id = nm.template_id
) AS template,
(
SELECT description
FROM notification_type
WHERE id = nm.notification_type_id
) AS notification_type,
(
SELECT description
FROM template_classification
WHERE id = nf.classification_id
) AS classification
FROM notification_master nm
INNER JOIN
(
SELECT
notification_master_id,
MAX(id) AS maxid,
LISTAGG(id,',') WITHIN GROUP (ORDER BY id) AS notification_list
FROM notification
GROUP BY notification_master_id
) nfagg ON nfagg.notification_master_id = nm.id
INNER JOIN notification nf
ON nf.id = nfagg.maxid
AND
(
(
DATE '1970-01-01' + NUMTODSINTERVAL(nf.created_at / 1000, 'SECOND')
< CURRENT_DATE + INTERVAL '-21' DAY
)
OR (nm.notification_type_id IN (2,4) AND nm.notification_status_id = 4)
)
WHERE nm.disable = 'N'
ORDER BY nm.id DESC
OFFSET 10 ROWS
FETCH NEXT 10 ROWS ONLY;
As mentioned, you may want to replace CURRENT_DATE with SYSDATE.
I recommend the following indexes for the query:
CREATE INDEX idx1 ON notification_master (disable, id, notification_status_id, notification_type_id);
CREATE INDEX idx2 ON notification (notification_master_id, id, created_at);
A last remark on paging: In order to skip n rows to get the next n, the whole query must get executed for all data and then all result rows be sorted only to pick n of them at last. It is usually better to remember the last fetched ID and then only select rows with a higher ID in the next execution.