I'm new to SQL and i'm try to make a query:
SELECT
clientId,
pagePath,
SUM(CASE
WHEN isExit IS NOT NULL THEN last_interaction
ELSE
nextTime
END
) AS time_on_page
FROM (
SELECT
hits.page.pagePath,
hits.isExit,
hits.time/1000 AS hits_time,
LEAD(hits.time/1000, 1) OVER (PARTITION BY fullVisitorId, visitid ORDER BY hits.time ASC) AS nextTime,
MAX(
IF
(hits.isInteraction = TRUE,
hits.time / 1000,
0)) OVER (PARTITION BY fullVisitorId, visitid) AS last_interaction
FROM
`merck-bigquery.1===.ga_sessions_20201231`,
UNNEST(hits) AS hits
WHERE
hits.type = "PAGE"
AND hits.page.hostname = 'www.msdmed.ru' )
GROUP BY
1
ORDER BY
2 ASC
The BigQuery returns an error Unrecognized name: clientId
I dont understand what's wrong in this query, because clientId its default field in BQ schema.
The outer query can see only fields listed in the inner query. Try removing clientId from outer one or adding clientId explicitly into the inner query.
Related
I want to convert my #PostgreSQL, CTE Query, into Normal Query because the cte function is mainly used in data warehouse SQL and not efficient for Postgres production DBS.
So, need help in converting this CTE query into a normal Query
WITH
cohort AS (
SELECT
*
FROM (
select
activity_id,
ts,
customer,
activity,
case
when activity = 'completed_order' and lag(activity) over (partition by customer order by ts) != 'email'
then null
when activity = 'email' and lag(activity) over (partition by customer order by ts) !='email'
then 1
else 0
end as cndn
from activity_stream where customer in (select customer from activity_stream where activity='email')
order by ts
) AS s
)
(
select
*
from cohort as s
where cndn = 1 OR cndn is null order by ts)
You may just inline the CTE into your outer query:
select *
from
(
select activity_id, ts, customer, activity,
case when activity = 'completed_order' and lag(activity) over (partition by customer order by ts) != 'email'
then null
when activity = 'email' and lag(activity) over (partition by customer order by ts) !='email'
then 1
else 0
end as cndn
from activity_stream
where customer in (select customer from activity_stream where activity = 'email')
) as s
where cndn = 1 OR cndn is null
order by ts;
Note that you have an unnecessary subquery in the CTE, which does an ORDER BY which won't "stick" anyway. But other than this, you might want to keep your current code as is.
Here I have a sample table of a website visitors. As we can see, sometimes visitor don't provide their email. Also they may switch to different email addresses over period.
**
Original table:
**
I want to update this table with following requirements:
First time when a visitor provides an email, all his past visits will be tagged to that email
Also all his future visits will be tag to that email until he switches to another email.
**
Expected table after update:
**
I was wondering if there is a way of doing it in Redshift or T-Sql?
Thanks everyone!
In SQL Server or Redshift, you can use a subquery to calculate the email:
select t.*,
coalesce(email,
max(email) over (partition by visitor_id, grp),
max(case when activity_date = first_email_date then email end) over (partition by visitor_id)
)
from (select t.*,
min(case when email is not null then activity_date end) over
(partition by visitor_id order by activity_date rows between unbounded preceding and current row) as first_email_date,
count(email) over (partition by visitor_id order by activity_date between unbounded preceding and current row) as grp
from t
) t;
You can then use this in an update:
update t
set emai = tt.imputed_email
from (select t.,
coalesce(email,
max(email) over (partition by visitor_id, grp),
max(case when activity_date = first_email_date then email end) over (partition by visitor_id)
) as imputed_email
from (select t.,
min(case when email is not null then activity_date end) over
(partition by visitor_id order by activity_date) as first_email_date,
count(email) over (partition by visitor_id order by activity_date) as grp
from t
) t
) tt
where tt.visitor_id = t.visitor_id and tt.activity_date = t.activity_date and
t.email is null;
If we suppose that the name of the table is Visits and the primary key of that table is made of the columns Visitor_id and Activity_Date then you can do in T-SQL following:
using correlated subquery:
update a
set a.Email = coalesce(
-- select the email used previously
(
select top 1 Email from Visits
where Email is not null and Activity_Date < a.Activity_Date and Visitor_id = a.Visitor_id
order by Activity_Date desc
),
-- if there was no email used previously then select the email used next
(
select top 1 Email from Visits
where Email is not null and Activity_Date > a.Activity_Date and Visitor_id = a.Visitor_id
order by Activity_Date
)
)
from Visits a
where a.Email is null;
using window function to provide the ordering:
update v
set Email = vv.Email
from Visits v
join (
select
v.Visitor_id,
coalesce(a.Email, b.Email) as Email,
v.Activity_Date,
row_number() over (partition by v.Visitor_id, v.Activity_Date
order by a.Activity_Date desc, b.Activity_Date) as Row_num
from Visits v
-- previous visits with email
left join Visits a
on a.Visitor_id = v.Visitor_id
and a.Email is not null
and a.Activity_Date < v.Activity_Date
-- next visits with email if there are no previous visits
left join Visits b
on b.Visitor_id = v.Visitor_id
and b.Email is not null
and b.Activity_Date > v.Activity_Date
and a.Visitor_id is null
where v.Email is null
) vv
on vv.Visitor_id = v.Visitor_id
and vv.Activity_Date = v.Activity_Date
where
vv.Row_num = 1;
For each visitor_id you can update the null email value with the previus non-null value. In case there is none, you will use the next non-null value.You can get those values as follows:
select
v.*, v_prev.email prev_email, v_next.email next_email
from
visits v
left join visits v_prev on v.visitor_id = v_prev.visitor_id
and v_prev.activity_date = (select max(v2.activity_date) from visits v2 where v2.visitor_id = v.visitor_id and v2.activity_date < v.activity_date and v2.email is not null)
left join visits v_next on v.visitor_id = v_next.visitor_id
and v_next.activity_date = (select min(v2.activity_date) from visits v2 where v2.visitor_id = v.visitor_id and v2.activity_date > v.activity_date and v2.email is not null)
where
v.email is null
I want to find out the previous page where the current page is a product page.
For example I have this page 'https://www.emag.ro/telefon-mobil-apple-iphone-x-64gb-4g-space-grey-mqac2rm-a/pd/DN094NBBM'and my previous page is this page 'https://www.emag.ro/search/telefoane-mobile/IPHONE/c?ref=srcql'
How in terms of hitnumber I can return how many users had this behavior.
I tried with this 2 query and I want to do a JOIN but I don't know how is better.
Also, I tried with LAG function but I don't know for sure if I catch all the users.
Thank you in advance.
with
view_product as (
SELECT
ga.fullVisitorId AS GA_USER_ID,
date as date,
h.hitnumber as hitnumber,
CONCAT(ga.fullVisitorId, cast(ga.visitId AS string)) AS SessionID,
(SELECT VALUE FROM h.customDimensions WHERE INDEX = 10) AS PAGETYPE,
(SELECT VALUE FROM h.customDimensions WHERE index =8) as ref_parameter,
visitid as visitid,
h.page.pagePath as page_path
FROM
`emagbigquery.0` ga,
UNNEST(hits) AS h
WHERE h.type='PAGE'
AND _TABLE_SUFFIX = '20190115'
AND (SELECT VALUE FROM h.customDimensions WHERE INDEX = 10) = 'viewproduct'
)
,
SEARCH_page_WITH_REF_SRCQL as (
select
date as date,
ga.fullVisitorId AS GA_USER_ID,
h.hitnumber as hitnumber,
CONCAT(ga.fullVisitorId, cast(ga.visitId AS string)) AS SessionID,
(SELECT VALUE FROM h.customDimensions WHERE INDEX = 10) AS PAGETYPE,
(SELECT VALUE FROM h.customDimensions WHERE index =8) as ref_parameter,
visitid as visitid,
h.page.pagePath as page_path
FROM
`emagbigquery.0` ga,
UNNEST(hits) AS h
WHERE h.type='PAGE'
AND _TABLE_SUFFIX = '20190115'
AND (SELECT VALUE FROM h.customDimensions WHERE INDEX = 10) = 'search'
AND (SELECT VALUE FROM h.customDimensions WHERE index =8) LIKE 'srcql'
)
select
COUNT(DISTINCT GA_USER_ID) AS USERS,
COUNT(DISTINCT SessionID) AS SESSIONS,
previous_page_from_srcql
from (
select
t1.ga_user_id,
t1.sessionid,
t2.hitnumber > t1.hitnumber as previous_page_from_srcql
from SEARCH_page_WITH_REF_SRCQL as t1
inner join view_product as t2
on t1.ga_user_id = t2.ga_user_id
group by
previous_page_from_srcql
Try UNNEST WITH OFFSET. It can give you an easy way to later determine that one row came after the other:
WITH path_and_prev AS (
SELECT ARRAY(
SELECT AS STRUCT session.page.pagePath
, LAG(session.page.pagePath) OVER(ORDER BY i) prevPagePath
FROM UNNEST(hits) session WITH OFFSET i
) x
FROM `google.com:analytics-bigquery.LondonCycleHelmet.ga_sessions_20130910`
)
SELECT COUNT(*) c, pagePath, prevPagePath
FROM path_and_prev, UNNEST(x)
WHERE pagePath='/vests/yellow.html'
AND prevPagePath='/vests/'
GROUP BY 2,3
I get this error
Msg 8156, Level 16, State 1, Line 67
The column 'MANDT' was specified multiple times for 'cte'."
when attempting to run the code below however I am not including the column MANDT in my query. Both tables that I am calling do have a column MANDT, but they both have the column STAT as well and I did not have a problem with another table attempting the same join, the only thing is that table did not have MANDT, only STAT was the same.
I attempted to include both columns MANDT with an alias: JCDS_SOGR.MANDT as Client and TJ30T.MANDT as Client2 separately and together, this did not pan out. Got the same error message.
;WITH cte AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY STAT ORDER BY UDATE) AS Rn,
*,
LAG(UDATE) OVER (PARTITION BY STAT ORDER BY UDATE) AS PrevUDate,
COUNT(*) OVER (PARTITION BY STAT) AS [Count]
FROM
JCDS_SOGR
JOIN
TJ30T on JCDS_SOGR.STAT = TJ30T.ESTAT
WHERE
OBJNR = 'IE000000000010003137'
)
SELECT
MAX(rn) AS [Count],
OBJNR, STAT, TXT30,
SUM(CASE
WHEN rn % 2 = 0
THEN DATEDIFF(d, PrevUDate, UDATE)
WHEN rn = [Count]
THEN DATEDIFF(d, UDATE, GETDATE())
ELSE 0
END) AS DIF
FROM
cte
GROUP BY
OBJNR, STAT, TXT30
This is the other query I referred to that works fine with this same code.
;with cte
AS
(
select ROW_NUMBER() OVER(partition by STAT Order by UDATE ) as Rn
, *
, LAG(UDATE) OVER(partition by STAT Order by UDATE ) As PrevUDate
, COUNT(*) OVER(partition by STAT) As [Count]
from JCDS_SOGR
join TJ02T on JCDS_SOGR.STAT = TJ02T.ISTAT
where OBJNR = 'IE000000000010003137'
and TJ02T.SPRAS = 'E'
)
select Max(rn) As [Count]
, OBJNR,STAT,TXT30
, SUM(CASE WHEN rn%2=0 THEN DATEDIFF(d,PrevUDate,UDATE)
WHEN rn=[Count] THEN DATEDIFF(d,UDATE,getDate())
ELSE 0 END) as DIF
from cte
group BY OBJNR, STAT,TXT30
The expected result is this
[COUNT OBJNR STAT TXT30 DIF
1 IE000000000010003137 I0099 Available 2810][1]
In your CTE, you are selecting *. So if you have two columns named MANDT, this could cause a conflict. Remove *. That should fix the problem that you described.
SELECT
transaction
,date
,mail
,status
,ROW_NUMBER() OVER (PARTITION BY mail ORDER BY date) AS rownum
FROM table1
Having the above table and script I want to be able to filter the transactions on the basis of having first 3 rowids with status 'failed' to show rowid 4 if 'failed', having transactions with rowid 4,5,6 failed - show 7 if also failed etc. I was thinking about adding it to a pandas dataframe where to run a simple lambda function , but would really like to find a solution in SQL only.
You could use lead() and lag() to explicitly check:
select t.*
from (select t1.*,
lag(status, 3) over (partition by mail order by date) as status_3,
lag(status, 3) over (partition by mail order by date) as status_2,
lag(status, 3) over (partition by mail order by date) as status_1,
lead(status, 1) over (partition by mail order by date) as status_3n,
lead(status, 2) over (partition by mail order by date) as status_2n,
lead(status, 3) over (partition by mail order by date) as status_3n
from t
) t
where status = 'FAILED' and
( (status_3 = 'FAILED' and status_2 = 'FAILED' and status_1 = 'FAILED') or
(status_2 = 'FAILED' and status_1 = 'FAILED' and status_1n = 'FAILED') or
(status_1 = 'FAILED' and status_1n = 'FAILED' and status_2n = 'FAILED') or
(status_1n = 'FAILED' and status_2n = 'FAILED and status_3n = 'FAILED')
)
This is a bit brute force, but I think the logic is quite clear.
You could simplify the logic to:
where regexp_like(status_3 || status_2 || status_1 || status || status_1n || status_2n || status3n,
'FAILED{4}'
)
Try this:
select * from (
SELECT
transaction
,date
,mail
,status
,ROW_NUMBER() OVER (PARTITION BY mail ORDER BY date) AS rownum
FROM table1
WHERE status = 'FAILED' )
where mod(rownum, 3) = 1;
Richard
One option is to use window functions. Use lag to get the previous status value (based on specified ordering) and compare it with the current row's value and assign groups with a running sum. Then count the values in each group and finally filter for that condition.
SELECT t.*
FROM
( SELECT t.*,
count(*) over(PARTITION BY mail, grp) AS grp_count
FROM
( SELECT t.*,
sum(CASE
WHEN (prev_status IS NULL AND status='FAILED') OR
(prev_status='FAILED' AND status='FAILED') THEN 0
ELSE 1
END) over(PARTITION BY mail ORDER BY "date","transaction") AS grp
FROM
( SELECT t.*,
lag(status) over(PARTITION BY mail ORDER BY "date","transaction") AS prev_status
FROM tbl t
) t
) t
) t
WHERE grp_count>=4
If you are using versions starting with Oracle 12c, there is an option to use MATCH_RECOGNIZE which would simplify this.
select *
from tbl
MATCH_RECOGNIZE (
PARTITION BY mail
ORDER BY "date" ,"transaction"
ALL ROWS PER MATCH
AFTER MATCH SKIP TO LAST FAIL
PATTERN(fail{4,})
DEFINE
fail AS (status='FAILED')
) MR
ORDER BY "date","transaction"