I'm trying to query event data from firebase. The goal is to get the last event for users with an event sequence starting with event a. The events are ordered by time. I have tried some approaches with lead, join etc. couldn't produce the desired result.
Eample data:
user_id
event_name
1
a
1
b
1
c
2
b
2
a
3
a
4
a
4
b
the ideal output:
user_id
event_name
1
c
3
a
4
b
The events are ordered by time.
So, I assume you do have column named somehow like time
Consider below approach
select user_id, event_name
from your_table
where true
qualify 1 = row_number() over(partition by user_id order by time desc)
and 'a' = first_value(event_name) over(partition by user_id order by time)
if applied to sample data in your question - output is
Related
I have a dataset as this:
ID SESSION DATE
1 A 2021/1/1
1 A 2021/1/2
1 B 2021/1/3
1. B 2021/1/4
1 A 2021/1/5
1 A 2021/1/6
So what I want to create is the GROUP column which assigns the same row number for where ID column AND SESSION column is the same as below:
ID SESSION DATE GROUP
1 A 2021/1/1 1
1 A 2021/1/2 1
1 B 2021/1/3 2
1 B 2021/1/4 2
1 A 2021/1/5 3
1 A 2021/1/6 3
Does anyone know how to do this in SQL in an efficient way because I have about 5 billion rows? Thank you in advance!
You have a kind of gaps and islands problem, you can create your groupings by counting when the session changes using lag, like so:
select Id, Session, Date,
Sum(case when session = prevSession then 0 else 1 end) over(partition by Id order by date) "Group"
from (
select *,
Lag(Session) over(partition by Id order by date) prevSession
from t
)t;
Example Fiddle using MySql but this is ansi SQL that should work in most DBMS.
I work in healthcare. In a Postgres database, we have a table member IDs and dates. I'm trying to pull the latest two dates for each member ID.
Simplified sample data:
A 1
B 1
B 2
C 1
C 5
C 7
D 1
D 2
D 3
D 4
Desired result:
A 1
B 1
B 2
C 1
C 5
D 1
D 2
I get a strong feeling this is for a homework assignment and would recommend that you look into partitioning and specifically rank() function by yourself first before looking at my solution.
Moreover, you have not specified how you received the initial result you provided, so I'll have to assume you just did select letter_column, number_column from my_table; to achieve the result.
So, what you actually want here is partition the initial query result into groups by the letter_column and select the first two rows in each. rank() function lets you assign each row a number, counting within groups:
select letter_column,
number_column,
rank() over (partition by letter_column order by number_column) as rank
from my_table;
Since it's a function, you can't use it in a predicate in the same query, so you'll have to build another query around this one, this time filtering the results where rank is over 2:
with ranked_results as (select letter_column,
number_column,
rank() over (partition by letter_column order by number_column asc) as rank
from my_table mt)
select letter_column,
number_column
from ranked_results
where rank < 3;
Here's an SQLFiddle to play around: http://sqlfiddle.com/#!15/e90744/1/0
Hope this helps!
I have a table like so:
id device group
-----------------
1 a 1000
2 a 1000
3 b 1001
4 b 1001
5 b 1001
6 b 1002
8 a 1003
9 a 1003
10 a 1003
11 a 1003
12 b 1004
13 b 1004
All id's and groups are sequential. What I would like is to select id and device based on groups and devices. Think of it as a pagination type selection. Getting the last group is a simple inner selection, but how do I select the second last group, or the third last group - etc.
I tried the row number function like this:
SELECT * FROM
( SELECT *, ROW_NUMBER() OVER (PARTITION BY device ORDER BY group DESC) rn FROM data) tmp
WHERE rn = 1;
.. but changing rn is giving me the previous id, not the previous group.
I would like to end up with a selection that could accomodate these results:
device = a, group = latest:
id device group
10 a 1003
11 a 1003
device = a, group = latest - 1:
id device group
1 a 1000
2 a 1000
Any one know how to accomplish this?
Edit:
Use case is a GPS enabled device in a car, sending data every 30 seconds. Imagine going on a drive today. First you go to the shops, then you go home. the first trip is you driving to the shop. The second trip is you driving back. I want to show those trips on a map, but it means I need to identify your last trip, and then the trip before it - ad infinitum, until you run out of trips.
You can try this approach:
`with x as (
select distinct page
from test_table),
y as (
select x.page
,row_number() over (order by page desc) as row_num
from x)
select test_table.* from test_table join y on y.page = test_table.page
where y.row_num =2`
I will try to explain what I have did here.
The first block(x) returns the distinct groups(pages in my case).
The second block(y) assigns row numbers to the groups in terms of their rank. In this case the ranking is in descending order of the pages.
Finally the third block, selects the desired value for the desired page. In case you want the pen-ultimate page , type rouw_num=2, if third from last use row_num =3 and likewise.
You can play around with the values [here]: http://sqlfiddle.com/#!15/190c06/26
Use dense_rank():
select d.*
from (select d.*, dense_rank() over (order by group_id desc) as seqnum
from data d
where device = 'a'
) d
where seqnum = 2;
WHAT IS THIS POST FOR?
In bigquery, I need to remove duplicated rows with a caveat
that duplication happened within the same visit for a visitor for page visits with the same pagename.
GROUP BY DOES NOT RESOLVE THE ISSUE
Below,I have explained the data, issue, possible resolution and measure taken to the best of my ability.
WHAT DATA AM I USING?
Adobe data imported to Bigquery ( no issues in importing )
Each row is the data collected by page-view for a visitor.
Each time a visitor moves to new page it counts a visit_page_num + 1 in the same visit_num and for same visitor_id.
The pagename is recorded for each page visited.
ISSUE :
In this image,
some of the visit_page_num are counted as unique on unique visitor_id, visit_num but are duplicated as the pagename is same.
ISSUE
visit_page_num pagename
1 a
2 b
3 c -issue
4 c -issue
5 d
6 d
7 d
8 e
9 c -issue
10 c -issue
11 c -issue
Solution so far with GROUP BY
visit_page_num pagename
1 a
2 b
3 c -issue
5 d
8 e
GOAL
visit_page_num pagename
1 a
2 b
3 c -Goal
5 d
8 e
9 c -Goal
When duplicates on same pagename occur but at different time in the visit then how do we insure the later duplication are not eliminated but counted as different page visit.
QUERY USED :
SELECT visitor_id
,visit_num
,pagename
,first (visit_page_num) AS first
,ROW_NUMBER() OVER(PARTITION BY visitor_id, visit_num ORDER BY visitor_id, visit_num, pagename) AS int_var
FROM [table]
GROUP BY visitor_id, visit_num, pagename
ORDER BY visitor_id, visit_num, first
OUTPUT :
Everything is Good EXCEPT the,
visitor_id = A on visit_num = 1 and pagename = c
ACTIONS TAKEN :
I have tried the LEAD and LAG function with MIN and MAX function in second step>>> SAME OUTPUT
CHECKED the web, normal SQL functions that can be translated into BIGQUERY >> SAME OUTPUT
Asked my team lead >> SAME OUTPUT
5 hours of experimenting >> SAME OUTPUT
CAVEAT
Cannot use field_date OR any time field OR any other field beside the one mentioned in the table
try locating only the transitions between pages e.g.
SELECT
visitor_id
, visit_num
, pagename
FROM (
select
*
, lead(pagename) over(partition by visitorid order by visit_page_num)
as nxt_page
from table1
) derived
WHERE nxt_page <> pagename or nxt_page IS NULL
ORDER BY visitor_id, visit_num
I have three tables, an Objects table, a Status table and a StatusTypes Table.
An Object has Multiple Status' which each has a status type. I would like to create a view that gives me the objects ID, and Most Important Status Description which is found in the StatusTypes table, and the most important status Date which is in the Status Table.
The part I am getting hung up on is to find the most Important Status It must first be sorted by the latest date, then by a integer weighting (Priority) in the Status Table then again by another weighting in the StatusTypes Table (Weighting)
What would be the best SQL statement to quickly deliver these results.
Objects
ID Aquisiton Date Serial Number
127237 1997-04-21 2151513515
127239 1997-10-31 2151513523
127242 1998-01-20 2165588481
127272 1998-10-20 2195689842
127286 1999-06-15 2231549489
127291 1999-06-01 2229564978
Status
ID ObjectID Priority StatusMessage Date Status
1 127237 1 Online 22.02.12 07.01.00 1
2 127237 3 Job Received 22.02.12 07.01.00 3
3 127237 5 Job Started 22.02.12.07.01.00 3
4 127237 5 Jam 22.02.12.07.01.00 2
5 127286 1 Online 22.02.12.07.09.00 1
Status Types
ID Description Weighting
1 Idle 0
2 Error 9
3 Working 5
Expected Output##
ID Status Date
127237 Error 22.02.12 07.01.00
127286 Idle 22.02.12.07.09.00
Sounds like you could use ROW_NUMBER():
SELECT *
FROM (SELECT *,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Date DESC, Priority, Weighting) 'RowRank'
FROM YourTable a
)sub
WHERE RowRank = 1
Obviously replacing YourTable with the relevant JOIN's
The ROW_NUMBER() function assigns a number to each row. PARTITION BY is optional, but used to start the numbering over for each value in that group, ie: if you PARTITION BY ID then for each unique ID value the numbering will start over at 1. ORDER BY of course is used to define how the counting should go, and is required in the ROW_NUMBER() function.
Updated with your data:
SELECT ObjectID,Description,Date
FROM (SELECT a.*,b.Description,ROW_NUMBER() OVER(PARTITION BY a.ObjectID ORDER BY CONVERT(DATE,LEFT([Date],8),4) DESC, Priority DESC, Weighting DESC) 'RowRank'
FROM Status a
JOIN Status_Types b
ON a.Status = b.ID
)sub
WHERE RowRank = 1
Demo: SQL Fiddle