Identify Sequence of Events In BigQuery

Identify Sequence of Events In BigQuery - google-bigquery

I needed help with some logic for the following dataset:
ID | POST10 | EVENTS_TIMESTAMP |
1 | picked | 2022.11.06 1:00pm|
1 | profile| 2022.11.06 1:30pm|
1 | front | 2022.11.06 1:35pm|
2 | profile| 2022.11.06 1:00pm|
2 | profile| 2022.11.06 1:30pm|
2 | front | 2022.11.06 1:35pm|
2 | front | 2022.11.06 1:36pm|
3 | picked | 2022.11.06 1:00pm|
3 | front | 2022.11.06 1:30pm|
3 | profile| 2022.11.06 1:35pm|
3 | front | 2022.11.06 1:36pm|
LOGIC SHOULD BE:
FOR A PERSON, FIRST VALUE SHOULD BE "picked", THEN "profile" AND IN BETWEEN THOSE TWO VALUES, "front" did not occur.** It can occur after or before those two(based on timestamp) but not in between.
ANSWER FOR THE DATASET ABOVE WOULD BE:
ID | ANSWER |
1 | SELECTED |
2 | NOT SELECTED|
3 | NOT SELECTED|
I wrote the sql but the greater/less than(<,>) arnt working as expected. It looks at the second part after AND individually. I need it to look inside the same window between picked and profile
(case when
(min(case when (post10) like '%picked%' then EVENTS_TIMESTAMP else null end) over (partition by (ID))
>=
min(case when (post10) like '%profile%' then EVENTS_TIMESTAMP else null end) over (partition by (ID)))
AND
(min(case when (post10) like '%profile%' then EVENTS_TIMESTAMP else null end) over (partition by (ID))
>=
min(case when (post10) like '%front%' then EVENTS_TIMESTAMP else null end) over (partition by (ID)))
then 'SELECTED'
else 'NOT SELECTED' end) as ANSWER

You might consider below
SELECT ID, IF(COUNTIF(flag) > 0, 'SELECT', 'NOT SELECTED') AS ANSWER
FROM (
SELECT *, POST10 = 'picked' AND LEAD(POST10) OVER w = 'profile' AS flag
FROM sample_table
WINDOW w AS (PARTITION BY ID ORDER BY PARSE_DATETIME('%Y.%m.%d %l:%M%p', EVENT_TIMESTAMP))
)
GROUP BY ID;
Query results

Related

How to apply different conditions for same column and output as new columns in Postgresql?

I have a Postgres table that looks like below
ip | up_score
-----------------+-------------------
223.110.181.122 | 1
242.123.249.85 | 0
10.110.11.1 | 1
10.254.253.1 | 1
19.7.40.40 | 0
242.123.249.85 | 1
10.110.11.1 | 1
19.7.40.40 | 0
10.254.253.1 | 0
223.110.181.122 | 0
19.7.40.40 | 0
10.254.253.1 | 1
Now I want a separate count of 0s and 1s per ip. I tried the queries below
select ip, count(up_score) from net_score where up_score = 0 group by ip;
select ip, count(up_score) from net_score where up_score = 1 group by ip;
But I want to combine these two queries together such that on a single execution I get the below result
ip | count_1 | count_0
-----------------+------------+-----------
223.110.181.122 | 1 | 1
242.123.249.85 | 1 | 1
10.110.11.1 | 2 | 0
10.254.253.1 | 2 | 1
19.7.40.40 | 0 | 3
How can I do this?

You could use a filter clause, something like this (untested):
select ip,
count(*) filter (where up_score = 0) AS count_0,
count(*) filter (where up_score = 1) AS count_1
from net_score group by ip;
edit: unfortunately above does not work for postgres <9.4

Thanks to #w08r for his solution, but I found a simpler solution here (https://dba.stackexchange.com/a/112797/258199) that uses case expression. I modified it for my own use and used it. I am posting the query below
SELECT ip,
COUNT(case when up_score = 0
then ip end) as count_0,
COUNT(case when up_score = 1
then ip end) as count_1
FROM net_score
GROUP BY ip;

MS-Access Query to PostgreSQL View

I am converting a microsoft access query into a postgresql view. The query has obvious components that I have found reasonable answers to. However, I am still stuck on getting the final result:
SELECT All_Claim_Data.Sec_ID,
Sum(IIf([Type]="LODE",IIf([Status]="Active",1,0),0)) AS LD_Actv,
Sum(IIf([Type]="LODE",IIf([Loc_Date]>#8/31/2017#,IIf([Loc_Date]<#9/1/2018#,1,0),0),0)) AS LD_stkd_17_18,
Sum(IIf([Type]="LODE",IIf([Loc_Date]>#8/31/2016#,IIf([Loc_Date]<#9/1/2017#,1,0),0),0)) AS LD_stkd_16_17,
Sum(IIf([Type]="LODE",IIf([Loc_Date]<#1/1/1910#,IIf(IsNull([Clsd_Date]),1,(IIf([Clsd_Date]>#1/1/1900#,1,0))),0),0)) AS Actv_1900s,
Sum(IIf([Type]="LODE",IIf([Loc_Date]<#1/1/1920#,IIf(IsNull([Clsd_Date]),1,(IIf([Clsd_Date]>#1/1/1910#,1,0))),0),0)) AS Actv_1910s,
FROM All_Claim_Data.Sec_ID,
GROUP BY All_Claim_Data.Sec_ID,
HAVING (((Sum(IIf([casetype_txt]="LODE",1,0)))>0));
Realizing I need to use CASE SUM WHEN, here is what I have worked out so far:
CREATE OR REPLACE VIEW hgeditor.vw_test AS
SELECT All_Claim_Data.Sec_ID,
SUM (CASE WHEN(Type='LODE' AND WHEN(Status='Active',1,0),0)) AS LD_Actv,
SUM (CASE WHEN(Type='LODE' AND WHEN(Loc_Date>'8/31/2017' AND Loc_Date<'9/1/2018',1,0),0),0)) AS LD_stkd_17_18,
SUM (CASE WHEN(Type='LODE' AND WHEN(Loc_Date<'1/1/1910' AND (IsNull(Clsd_Date),1,(WHEN([Clsd_Date]>'1/1/1900',1,0))),0),0)) AS Actv_1900s
FROM All_Claim_Data.Sec_ID,
GROUP BY All_Claim_Data.Sec_ID,
HAVING (((SUM(IIf(Type='LODE',1,0)))>0));
The goal is to count the number of instances in which the Sec_ID has the following:
has (Type = LODE and Status = Active) = SUM integer
has (Type = LODE and Loc_Date between 8/31/2017 and 9/1/2018) = SUM Integer
My primary issue is getting a SUM integer to populate in the new columns

Case expressions are the equivalent to the Access IIF() functions, but WHEN isn't a function so it isn't used by passing a set of parameters. Think of it as being a tiny where clause instead, it evaluates one or more predicates to determine what to do, and the action taken is established by what you specify after THEN
CREATE OR REPLACE VIEW hgeditor.vw_test AS
SELECT
All_Claim_Data.Sec_ID
, SUM( CASE
WHEN TYPE = 'LODE' AND
STATUS = 'Active' THEN 1
ELSE 0
END ) AS LD_Actv
, SUM( CASE
WHEN TYPE = 'LODE' AND
Loc_Date > to_date('08/31/2017','mm/dd/yyyy') AND
Loc_Date < to_date('09/1/2018','mm/dd/yyyy') THEN 1
ELSE 0
END ) AS LD_stkd_17_18
, SUM( CASE
WHEN TYPE = 'LODE' AND
Loc_Date < to_date('1/1/1910','mm/dd/yyyy') AND
[Clsd_Date] > to_date('1/1/1900','mm/dd/yyyy') THEN 1
ELSE 0
END ) AS Actv_1900s
FROM All_Claim_Data.Sec_ID
GROUP BY
All_Claim_Data.Sec_ID
HAVING COUNT( CASE
WHEN Type = 'LODE' THEN 1
END ) > 0
;
By the way, you should NOT be relying on MM/DD/YYYY as dates in Postgres
nb: Aggregate functions ignore NULL, take this example:
+----------+
| id value |
+----------+
| 1 x |
| 2 NULL |
| 3 x |
| 4 NULL |
| 5 x |
+----------+
select
count(*) c_all
, count(value) c_value
from t
+-------+----------+
| c_all | c_value |
+-------+----------+
| 5 | 3 |
+-------+----------+
select
sum(case when value IS NOT NULL then 1 else 0 end) sum_case
, count(case when value IS NOT NULL then 1 end) count_case
from t
+----------+-------------+
| sum_case | count_case |
+----------+-------------+
| 3 | 3 |
+----------+-------------+

Transpose row-columns and Counting number of instances per column in SQLITE

I have a table structured as
|creationDate|rule. |position|
|01.01.2018 |squid:S1132|12 |
|01.01.2018 |squid:S1132|14 |
|01.01.2018 |squid:S1132|19 |
|01.01.2018 |squid:S1121|12 |
|01.01.2018 |squid:S1121|14 |
|01.02.2018 |squid:S1130|12 |
My goal is to count the number of rules per date, reporting them in different columns.
|creationDate| S1132 | S1121 | S1130 |
|01.01.2018 | 3 |2 | 0 |
|01.02.2018 | 0 |0 | 1 |
I have a total of 180 rules...
Is it possible to make it in a single query?
Running this query
select creationDate , count("creationDate") as "squid:S1132"
from SONAR_ISSUES
where rule='squid:S1132' group by creationDate
I obtain this result
|creationDate|S1132 |
|01.01.2018 |3 |
I can do a similar query for each rule, but then, I am not able to merge them...

try by using case when
select creationDate ,count(case when rule='squid:S1132' then 1 end) as S1132,
count(case when rule='squid:S1121' then 1 end) as S1121,
count(case when rule='squid:S1130' then 1 end) as S1130
from SONAR_ISSUES
group by
creationDate

You can try using conditional aggregation
DEMO
select
creationDate,
count(case when rule='squid:S1132' then "creationDate" end) as "squid:S1132",
count(case when rule='squid:S1121' then "creationDate" end) as "squid:S1121" ,
count(case when rule='squid:S1130' then "creationDate" end) as "squid:S1130"
from SONAR_ISSUES
group by creationDate

How to write SQL query to calculate instances where a row containing a distinct id occurs 7 days after the fist occurrence if the unique id?

I am looking to return a date, the count of unique_ids first occurrences on that date, the number unique_ids that occurred 7 days after their first occurrence and the percentage of occurrences after 7 days / number of first occurrences.
example data_import table
+---------------------+------------------+
| time | distinct_id |
+---------------------+------------------+
| 2018/10/01 | 1 | first instance of `1`
+---------------------+------------------+
| 2018/10/01 | 2 | also first instance, but does not occur 7 days later
+---------------------+------------------+
| 2018/10/02 | 1 | should be disregarded (not first instance of 1)
+---------------------+------------------+
| 2018/10/02 | 3 | first instance of `3`
+---------------------+------------------+
| 2018/10/08 | 1 | First instance 7 days after first instance of `1`
+---------------------+------------------+
| 2018/10/08 | 1 | Don't count as this is the 2nd instance of `1` on this day
+---------------------+------------------+
| 2018/10/09 | 3 | 7 days after first instance of `3`
+---------------------+------------------+
| 2018/10/09 | 1 | 7 days after non-first instance of `1`
+---------------------+------------------+
And the expected return.
+---------------------+----------------------+------------------------+---------------------------+
| time | num_of_1st_instance | num_occur_7_days_after | percent_used_7_days_after |
+---------------------+----------------------+------------------------+---------------------------+
| 2018/10/01 | 2 | 1 | .50 |
+---------------------+----------------------+------------------------+---------------------------+
| 2018/10/02 | 1 | 1 | 1.0 |
+---------------------+----------------------+------------------------+---------------------------+
| 2018/10/03 | 0 | 0 | 0 |
+---------------------+----------------------+------------------------+---------------------------+
The query I have written is close, but counts occurrences other that the first for a distinct_id.
In my example, this query would include the occurrence of distinct_id 1 on 2018/10/02 and it's occurrence seven days after 2018/10/02 on 2018/10/09. Not wanted as the 2018/10/02 occurrence of distinct_id 1 is not it's first.
SELECT
data_import.time AS date,
count(distinct data_import.distinct_id) AS num_installs_on_install_date,
count(distinct future_activity.distinct_id) AS num_occur_7_days_after,
count(distinct future_activity.distinct_id) / count(distinct data_import.distinct_id)::float AS percent_used_7_days_after
FROM data_import
LEFT JOIN data_import AS future_activity ON
data_import.distinct_id = future_activity.distinct_id
AND
DATE(data_import.time) = DATE(future_activity.time) - INTERVAL '7 days'
AND
data_import.time = ( SELECT
time
FROM
data_import
WHERE
distinct_id = future_activity.distinct_id
ORDER BY
time
limit
1 )
GROUP BY DATE(data_import.time)
I hope that I explained this clearly. Please let me know how I can change my current query or a different approach to the solution.

Hmmm. Does this do what you want?
select di.time, sum( (seqnum = 1)::int) as first_instance,
sum( flag_7day ) as num_after_7_day,
sum( (seqnum = 1)::int) * 1.0 / sum( flag_7day ) as ratio
from (select di.*,
row_number() over (partition by distinct_id order by time) as seqnum,
(case when exists (select 1 from data_import di2 where di2.distinct_id = di.distinct_id and di2.time > di.time + interval '7 day')
then 1 else 0
end) as flag_7day
from data_import di
) di
group by di.time;
This doesn't return days with no first instances. Those days seem a bit awkward with respect to the ratio, so I'm not 100% sure that you really need them. If you do, it is easy enough to include a generate_series() to generate all dates in the range that you want.

How can I convert rows to columns in SQL

I've been referencing this question a lot, but my case is a little different so I haven't quite figured it out.
I have a set of data that looks something like this:
--------------------------------------
| Id | Answer| Question | EntryBy
--------------------------------------
| 1 |John |Name? | User1 |
| 2 |2.4 |Raiting? | User1 |
| 3 |ZH1E4A |UserId? | User1 |
| 4 |Paul |Name? | User1 |
| 5 |2.3 |Raiting? | User1 |
| 6 |Ron |Name? | User2 |
| 7 |857685 |UserId? | User2 |
----------------------------
I need to pivot the data so that it's structured like so:
----------------------------------------------------------
| Category | Name? | Raiting? | UserId? | EntryBy |
----------------------------------------------------------
| Category1| John | 2.4 | ZH1E4A | User1 |
| Category1| Paul | 2.3 | NULL | User1 |
| Category1| Ron | NULL | 857685 | User2 |
As you can see, there are multiple "Questions" but they don't always have an answer/value. I know the exact number of questions that may be asked/answered so I'm assuming that may help if I used a CASE expression?
Note: The 'Category' column in the last table is just another value similar to 'EntryBy' in the first. I've attempted the pivot approach in the cited question, but the results I get are not correct. I also tried the CASE statement but it resulted in an error since the Questions are titled the same.

Being 2008, we lose the sum() over function, but we can simulate it via a cross apply to create a Grp indicator.
This also assumes the ID is sequential (risky) and Name? is the Group Key.
Also, check the spelling of RAITING
Also, I have no idea where Category is coming from
Example
Select [Name?] = max(case when Question = 'Name?' then Answer end)
,[Raiting?] = max(case when Question = 'Raiting?' then Answer end)
,[UserId?] = max(case when Question = 'UserId?' then Answer end)
,[EntryBy?] = max([EntryBy])
From (
Select A.*
,B.Grp
From YourTable A
Cross Apply (Select Grp=count(*) from YourTable where Question='Name?' and ID<=A.ID) B
) A
Group By Grp
Returns
Name? Raiting? UserId? EntryBy?
John 2.4 ZH1E4A User1
Paul 2.3 NULL User1
Ron NULL 857685 User2

This only does a single parse of the table (or "Values Table Expression") for this one, compared to John's, which does 2:
WITH VTE AS (
SELECT *
FROM (VALUES
(1,'John ','Name? ','User1'),
(2,'2.4 ','Raiting?','User1'),
(3,'ZH1E4A','UserId? ','User1'),
(4,'Paul ','Name? ','User1'),
(5,'2.3 ','Raiting?','User1'),
(6,'Ron ','Name? ','User2'),
(7,'857685','UserId? ','User2'),
(8,'Steve ','Name? ','User3'),
(9,'2.5 ','Raiting?','User3'),
(10,'Jane ','Name? ','User3'),
(11,'GA18S1','UserId? ','User3'),
(12,'2.3 ','Raiting?','User3'),
(13,'ABH12D','UserId? ','User3')) V(ID, Answer, Question, EntryBy)),
Groups AS(
SELECT *,
ROW_NUMBER() OVER (ORDER BY ID ASC) -
ROW_NUMBER() OVER (PARTITION BY CASE WHEN Question = 'Name?' THEN 0 ELSE 1 END ORDER BY ID ASC) AS Grp
FROM VTE)
SELECT 'Category1' AS Category,
MAX(CASE Question WHEN 'Name?' THEN Answer ELSE NULL END) AS [Name?],
MAX(CASE Question WHEN 'Raiting?' THEN Answer ELSE NULL END) AS [Raiting?],
MAX(CASE Question WHEN 'UserID?' THEN Answer ELSE NULL END) AS [UserID?],
EntryBy
FROM Groups
GROUP BY CASE Grp WHEN 0 THEN Grp + 1 ELSE Grp END,
EntryBy
ORDER BY CASE Grp WHEN 0 THEN Grp + 1 ELSE Grp END;
I also added a few extra values to display what happens if the sequencing goes wrong.
Result set:
Category Name? Raiting? UserID? EntryBy
--------- ------- -------- ------- -------
Category1 John 2.4 ZH1E4A User1
Category1 Paul 2.3 NULL User1
Category1 Ron NULL 857685 User2
Category1 Steve 2.5 NULL User3
Category1 Jane 2.3 GA18S1 User3

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Identify Sequence of Events In BigQuery - google-bigquery

Related

How to apply different conditions for same column and output as new columns in Postgresql?

MS-Access Query to PostgreSQL View

Transpose row-columns and Counting number of instances per column in SQLITE

How to write SQL query to calculate instances where a row containing a distinct id occurs 7 days after the fist occurrence if the unique id?

How can I convert rows to columns in SQL

Categories

Resources