How can i find rows before a specific value? - sql

I have the next row and what I want to do is to select all the rows before the type "shop". I tried using case in the "where clause" but I didn't get any result. How can I do it?
|id|visitnumber|type |
|01| 1|register|
|01| 2|visit |
|01| 3|visit |
|01| 4|shop |
|01| 5|visit |
For example, what I want to get is the visitnumber before type = "shop".
it would be very helpful because what I'm trying to do is to get all the actions that happened before an specific event on big query.
|id|numberofvisits|
|01| 3|

One method uses correlated subqueries:
select id, count(*)
from t
where visitnumber < (select min(t2.visitnumber) from t t2 where t2.id = t.id and type = 'shop')
group by id;
However, in BigQuery, I prefer an approach using window functions:
select id, countif(visitnumber < visitnumber_shop)
from (select t.*,
min(case when type = 'shop' then visitnumber end) over (partition by id) as visitnumber_shop
from t
) t
group by id;
This has the advantage of keeping all ids even those that don't have a "shop" type.

One option uses a subquery for filtering:
select id, count(*) number_of_visits
from mytable t
where t.visit_number < (
select min(t1.visit_number)
from mytable t
where t1.id = t.id and t1.type = 'shop'
)
group by id
You can also use window functions:
select id, count(*) number_of_visits
from (
select
t.*,
countif(type = 'shop') over(partition by id order by visit_number) has_shop
from mytable t
) t
where has_shop = 0
group by id

Below option is for BigQuery Standard SQL
#standardSQL
SELECT id,
ARRAY_LENGTH(SPLIT(REGEXP_EXTRACT(',' || STRING_AGG(type ORDER BY visitnumber), r'(.*?),shop'))) - 1 AS number_of_visits_before_first_shop
FROM `project.dataset.table`
GROUP BY id
You can test, play with above using dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT '01' id, 1 visitnumber, 'register' type UNION ALL
SELECT '01', 2, 'visit' UNION ALL
SELECT '01', 3, 'visit' UNION ALL
SELECT '01', 4, 'shop' UNION ALL
SELECT '01', 5, 'visit' UNION ALL
SELECT '02', 1, 'register' UNION ALL
SELECT '02', 2, 'visit' UNION ALL
SELECT '02', 3, 'visit' UNION ALL
SELECT '03', 1, 'shop' UNION ALL
SELECT '03', 2, 'shop' UNION ALL
SELECT '03', 3, 'visit'
)
SELECT id,
ARRAY_LENGTH(SPLIT(REGEXP_EXTRACT(',' || STRING_AGG(type ORDER BY visitnumber), r'(.*?),shop'))) - 1 AS number_of_visits_before_first_shop
FROM `project.dataset.table`
GROUP BY id
with result
Row id number_of_visits_before_first_shop
1 01 3
2 02 null
3 03 0

This is the query i run on Big Query with an Analytics 360 test dataset:
select
id,
visitnumber,
countif(hit_number < hitnumber_quickviewclick) as hitsprev_quickviewclick
from (
select
a.fullVisitorID as id,
a.visitnumber as visitnumber,
h.hitNumber as hit_number,
MIN (case when h.eventInfo.eventAction = 'Quickview Click' then h.hitNumber end) over (partition by a.fullVisitorID) as hitnumber_quickviewclick
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170725` as a
CROSS JOIN UNNEST(hits) as h
) as T
group by 1,2;
I wanted to make a query where i could find the total number of hits before the event action 'quickview click' hitted. If this is wrong or can be improved let me know!
Thanks a lot, guys!

This is how I would approach in SQL in general:
select count(*)
from yourtable yt
where type = 'visit' and not exists (
select 1
from yourtable yt2
where yt.id > yt2.id and yt2.type = 'shop'
)
However, I would very much think about situations when we want to find visits before the next shop... And the next shop... And the next shop. For that purpose you could find out the ids of shop and group by intervals.

Related

Oracle SQL - Count based on a condition to include distinct rows with zero matches

Is there a "better" way to refactor the query below that returns the number occurrences of a particular value (e.g. 'A') for each distinct id? The challenge seems to be keeping id = 2 in the result set even though the count is zero (id = 2 is never related to 'A'). It has a common table expression, NVL function, in-line view, distinct, and left join. Is all of that really needed to get this job done? (Oracle 19c)
create table T (id, val) as
select 1, 'A' from dual
union all select 1, 'B' from dual
union all select 1, 'A' from dual
union all select 2, 'B' from dual
union all select 2, 'B' from dual
union all select 3, 'A' from dual
;
with C as (select id, val, count(*) cnt from T where val = 'A' group by id, val)
select D.id, nvl(C.cnt, 0) cnt_with_zero from (select distinct id from T) D left join C on D.id = C.id
order by id
;
ID CNT_WITH_ZERO
---------- -------------
1 2
2 0
3 1
A simple way is conditional aggregation:
select id,
sum(case when val = 'A' then 1 else 0 end) as num_As
from t
group by id;
If you have another table with one row per id, you I would recommend:
select i.id,
(select count(*) from t where t.id = i.id and t.val = 'A') as num_As
from ids i;

Find record closest to a given date for each group - SQL

I am new to sql. Suppose we have a table like this:
+-------+----------+-----------+
|userid | statusid | date |
+-------+----------+-----------+
| 1 | 1 | 2018-10-10|
| 1 | 2 | 2018-10-12|
| 2 | 1 | 2018-09-25|
| 2 | 1 | 2018-10-01|
+-------+----------+-----------+
I need to get the stateid of each userid for a date as close to a given one as possible. Say my given date is 2018-10-01. How would I do that? I tried various groupby's and partition by, but nothing works. Could someone please help?
EDIT: my db is amazon redshift
you can use row_number() window analytic function with ordered by absolute value of date difference.
( Note that row_number() doesn't work for MySQL 8-, so that function is not used but abs() function is. )
I don't know your DBMS
This solution is for Oracle :
with tab(userid, statusid, "date") as
(
select 1,1,date'2018-10-10' from dual union all
select 1,2,date'2018-10-12' from dual union all
select 2,1,date'2018-09-25' from dual union all
select 2,1,date'2018-10-02' from dual
)
select tt.userid, tt.statusid, tt."date"
from
(
select t.userid, t.statusid , t."date",
row_number() over (partition by t.userid
order by abs("date" - date'2018-10-01')) as rn
from tab t
) tt
where tt.rn = 1
Demo for Oracle
This solution is for SQL Server :
with tab([userid], [statusid], [date]) as
(
select 1,1,'2018-10-10' union all
select 1,2,'2018-10-12' union all
select 2,1,'2018-09-25' union all
select 2,1,'2018-10-02'
)
select tt.[userid], tt.[statusid], tt.[date]
from
(
select t.[userid], t.[statusid] , t.[date],
row_number() over (partition by t.[userid]
order by abs(datediff(day,[date],'2018-10-01'))) as rn
from tab t
) tt
where tt.rn = 1
Demo for SQL Server
The solution is for My SQL:
select tt.userid, tt.statusid, tt.date
from
(
select t.userid, t.statusid , t.date,
#rn := if(#iter = t.userid, #rn + 1, 1) as rn,
#iter := t.userid,
abs(date - date'2018-10-01') as df
from tab t
join (select #iter := 0, #rn := 0) as q_iter
order by t.userid, abs(date - date'2018-10-01')
) tt
where tt.rn = 1
Demo for My SQL
This solution is for PostGRES :
with tab(userid, statusid, date) as
(
select 1,1,'2018-10-10' union all
select 1,2,'2018-10-12' union all
select 2,1,'2018-09-25' union all
select 2,1,'2018-10-02'
)
select tt.userid, tt.statusid, tt.date
from
(
select t.userid, t.statusid , t.date,
row_number() over (partition by t.userid
order by abs(date::date-'2018-10-01'::date)) as rn
from tab t
) tt
where tt.rn = 1
Demo for PostGRESql
Usually for this type of problem, you want the date on or before the given date.
If so:
select t.*
from t
where t.date = (select max(t2.date)
from t t2
where t2.userid = t.userid and t2.date <= '2018-10-01'
);

T-SQL to stamp individual rows

I'm looking for some T-SQL code that will add a column called tag which will tag each row with the same number until there is a change in value within any of the columns "team", "id", "kmvid", "name", "cid" and "pid". If there is a change, use the next sequence of numbering for that row. See the expected results below as image.
You can do this with cumulative sums and lag():
select t.*,
sum(case when prev_team = team and
id = id and
kmvid = kmvid and
name = name and
prev_id = id and
prev_cid = pid and
prev_oid = cid
then 1 else 0
end) over (order by date) as Tag
from (select t.*,
lag(team) over (order by date) as prev_team,
lag(id) over (order by date) as prev_id,
lag(kmvid) over (order by date) as prev_kmvid,
lag(name) over (order by date) as prev_name,
lag(cid) over (order by date) as prev_cid,
lag(pid) over (order by date) as prev_pid
from t
) t;
You should not use a cursor for things that are better done using set-based operations.
The following will give you the desired results based on the data you took a picture of...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
CountryId CHAR(2) NOT NULL,
[Date] DATETIME NOT NULL
);
INSERT #TestData (CountryId, Date)
SELECT '99', '2004-04-30' UNION ALL
SELECT '99', '2004-07-31' UNION ALL
SELECT '99', '2004-10-31' UNION ALL
SELECT '99', '2005-01-31' UNION ALL
SELECT '99', '2005-04-30' UNION ALL
SELECT '99', '2005-07-31' UNION ALL
SELECT '99', '2005-10-31' UNION ALL
SELECT '99', '2006-01-31' UNION ALL
SELECT '99', '2006-04-30' UNION ALL
SELECT '99', '2006-07-31' UNION ALL
SELECT '99', '2006-10-31' UNION ALL
SELECT '99', '2007-01-31' UNION ALL
SELECT 'HK', '2007-04-30' UNION ALL
SELECT 'CA', '2007-07-31' UNION ALL
SELECT 'HK', '2007-10-31';
-- SELECT * FROM #TestData td;
--=======================================
WITH
cte_TagGroup AS (
SELECT
td.CountryId,
td.[Date],
TagGroup = ROW_NUMBER() OVER (ORDER BY td.Date)
- ROW_NUMBER() OVER (PARTITION BY td.CountryId ORDER BY td.Date)
- CASE WHEN td.CountryId = LAG(td.CountryId, 1, td.CountryId) OVER (ORDER BY td.CountryId, td.Date) THEN 0 ELSE 1 END
FROM
#TestData td
)
SELECT
tg.CountryId,
tg.Date,
Tag = DENSE_RANK() OVER (ORDER BY tg.TagGroup)
FROM
cte_TagGroup tg;
... That said, I suspect that the 1st 3 columns of data aren't all the same values, so you may need to add columns to the "PARTITION BY" clauses to make it fit your actual data.
HTH,
Jason

Select value based on priority of another column Oracle SQL

I would like to select only one email address per id, if the id has both a work and personal email, I would only like to display the work email.
with emails as(
select '1' id, 'work' email_type, 'abc#gmail.com' email from dual union all
select '2' id, 'work' email_type, '123#yahoo.com' email from dual union all
select '2' id, 'personal' email_type, '456#msn.com' email from dual union all
select '3' id, 'personal' email_type, 'test#work.com' email from dual
)
For this example I would like to display:
id email_type email
1 work abc#gmail.com
2 work 123#yahoo.com
3 personal test#work.com
You can prioritize those values in row_number and get the first row for each id.
select id,email_type,email
from (select id,email_type,email
,row_number() over(partition by id order by case when email_type='work' then 1 else 2 end) as rn
from emails) t
where rn = 1
Assuming only possible value for email_type are work and personal, you can use window function row_number:
select *
from (
select t.*,
row_number() over (
partition by id order by email_type desc
) as seqnum
from emails t
) t
where seqnum = 1;
You can use a subquery to figure out if there is a work email or not. With your sample data, you can use the MAX function to return the "work" type if it exists, and if it doesn't it will just return "personal". Joining back on that will give the appropriate result.
WITH T1 AS (SELECT id, MAX(email_type) AS e_type FROM table_name GROUP BY id)
SELECT T1.id, T1.e_type, T2.email
FROM T1 LEFT JOIN table_name T2 ON T1.id = T2.id AND T1.e_type = T2.email_type
ORDER BY T1.id

SQLCount distinct of one column based on another column

I need to count the sessions which visited a particular page once and sessions which visited the same page once or more than once. For example: consider these sessions between 1st to 4th April:
Session_id| Date
----+---------
1| 01/04/2016
1| 02/04/2016
2| 01/04/2016
3| 01/04/2016
4| 01/04/2016
4| 03/04/2016
4| 04/04/2016
I can not do it using a sub query as there are millions of session. Hence a query like this won't work for me:
select case when no_of_visits=1 then 'single_visit'
when no_of_visits>1 then 'multiple_visits' end as visit,
count(distinct session_id) as sessions
FROM (
select session_id,
count(distinct date) as no_of_visits
from my_table
group by session_id
) a
group by case when no_of_visits=1 then 'single_visit'
when no_of_visits>1 then 'multiple_visits' end
The answer should be like this:
Visit|Sessions
single_visit|2
multiple_visit|2
Is there any way where i can do something like this:
count(distinct session_id) where no_of_visits=1 and count(distinct session_id) where no_of_visits>=1 without subquery or self join?
Any help would be deeply appreciated.
You can't avoid subqueries to get this result, but you can get rid of the count(distinct) (which is probably the most expensive part):
select no_of_visits,
count(*) as sessions
FROM
(
select session_id,
case when min(date) <> max(date) -- at least two different dates
then 'multiple_visits'
else 'single_visit'
end as no_of_visits
from table_name
group by session_id
) a
group by no_of_visits
It seems you want something like this:
select session_id, count(distinct dt) as no_of_visits,
case when count(distinct dt) = 1 then "Single visit"
else "Multiple visits" end as visit
from my_table
group by session_id;
Note: I used dt rather than "date" as a column name; date is an Oracle key word, and such words should not be used as schema, table or column names.
Oracle Setup:
CREATE TABLE table_name ( Session_id, "Date" ) AS
SELECT 1, DATE'2016-04-01' FROM DUAL UNION ALL
SELECT 1, DATE'2016-04-02' FROM DUAL UNION ALL
SELECT 2, DATE'2016-04-01' FROM DUAL UNION ALL
SELECT 3, DATE'2016-04-01' FROM DUAL UNION ALL
SELECT 4, DATE'2016-04-01' FROM DUAL UNION ALL
SELECT 4, DATE'2016-04-03' FROM DUAL UNION ALL
SELECT 4, DATE'2016-04-04' FROM DUAL;
Query 1:
SELECT visit,
COUNT(1) AS Sessions
FROM (
SELECT session_id,
CASE WHEN COUNT(1) > 1
THEN 'Multiple Visits'
ELSE 'Single Visit'
END AS visit
FROM table_name
GROUP BY session_id
)
GROUP BY visit;
Output:
VISIT SESSIONS
--------------- ----------
Multiple Visits 2
Single Visit 2
Query 2:
SELECT first_visit - multiple_visits AS single_visit,
multiple_visits
FROM (
SELECT COUNT( CASE rn WHEN 1 THEN 1 END ) AS first_visit,
COUNT( CASE rn WHEN 2 THEN 1 END ) AS multiple_visits
FROM (
SELECT rn --
FROM ( --
SELECT ROW_NUMBER() OVER ( PARTITION BY session_id ORDER BY ROWNUM ) AS rn
FROM table_name
) --
WHERE rn <= 2 -- These lines are optional
)
);
Output:
SINGLE_VISIT MULTIPLE_VISITS
------------ ---------------
2 2