I have a table of user interactions on a web site and I need to calculate the average time between interactions of each user. To make it more simple to understand, here's some records of the table:
Where the first column is the user id and the second is the interaction time. The results that I need is the average time between interactions of each user. Example:
The user 12345 average interaction interval is 1 day
I've already tried to use window functions, but i couldn't get the average because PostgreSQL doesn't let me use GROUP BY or AVG on window functions, I could get the intervals using the following command, but couldn't group it based on the user id.
SELECT INTERACTION_DATE - LAG(INTERACTION_DATE ) OVER (ORDER BY INTERACTION_DATE )
So, I decided to create my own custom function and after that, create a custom aggregate function to do this, and use this function on a group by clause:
CREATE OR REPLACE FUNCTION DATE_INTERVAL(TIMESTAMP)
RETURNS TABLE (USER_INTERVALS INTERVAL)
AS $$
SELECT $1 - LAG($1) OVER (ORDER BY $1)
$$
LANGUAGE SQL
IMMUTABLE;
But this function only return several rows with one column with null value.
Is there a better way to do this?
You need to first calculate the difference between the interactions for each row (and user), then you can calculate the average on that:
select user_id, avg(interaction_time)
from (
select user_id,
interaction_date - lag(interaction_date) over (partition by user_id order by interaction_date) as interaction_time
from the_table
) t
group by user_id;
Encapsule your first query then compute the average:
SELECT AVG(InteractionTime) FROM (
SELECT INTERACTION_DATE - LAG(INTERACTION_DATE ) OVER (ORDER BY INTERACTION_DATE ) AS InteractionTime
)
Related
I have 3 fields: username, tracking_id, timestamp. One user will have multiple rows (some have more, some have less) with different tracking ids and timestamps for each action he has taken on my website. I want to group by the username and get the tracking ids of that user's 10th through 70th action. I use standard SQL on BigQuery.
First problem is, I can't find syntax to access a range in the STRUCT (only a single row or using a limit to get the first/last 70 rows for example). Then, I can image after managing to access a range, there could be an issue with the index being out of bounds because some users might not have 70 or more actions.
SELECT
username,
ARRAY_AGG(STRUCT(tracking_id,
timestamp)
ORDER BY
timestamp
)[OFFSET (9 to 69)] #??????
FROM
table
The result should be a table with the same 3 fields: username, tracking_id, timestamp, but instead of containing ALL the user's rows, it should only contain each users 10th to 70th row.
Below is for BigQuery Standard SQL
#standardSQL
SELECT username,
ARRAY_AGG(STRUCT(tracking_id, `timestamp`) ORDER BY `timestamp`) AS selected_actions
FROM (
SELECT * EXCEPT(pos) FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY username ORDER BY `timestamp`) pos
FROM `project.dataset.table`
)
WHERE pos BETWEEN 10 AND 70
)
GROUP BY username
Given a Hive table:
create table mock
(user string,
url string
);
How to sample a certain percentage of url (say 50%) or certain number of url for each user?
There is a built-in query to extract samples from a table.
SELECT * FROM mock TABLESAMPLE(50 PERCENT)
Here is an alternative solution using row_number(). First number each rows for each user
with numbered as (
SELECT user, url, row_number() OVER (PARTITION BY user ORDER BY user) as rn FROM mock
)
Then just either select the odd or even rows using pmod to get 50% sample
SELECT user, url FROM numbered where pmod(rn,2) = 0
I have this table with just one column of type datetime. And I have many events per day.
To get all the events it is just SELECT date FROM table;
How can I add a serial number to each row so that the first row in each day is 1, the second is 2, and so on, reseting the serial count on the next day?
I am looking for a solution using PostgreSQL window functions.
Create dummy date first (should haven been part of question):
CREATE TABLE "table"
(
date timestamp with time zone
);
insert into "table"(date)
select now()+(x*23||'minutes')::interval
from generate_series(1, 100) as x;
Then query all rows including a counter:
select date, row_number() over (partition by date_trunc('day', date))
from "table"
Write Oracle SQL query to fetch from Tasks table top Approval Statuses that appear after some first null value in the Approval_Status Column and then Approval Status sequence and then some null values
Facts
I only need the top Approval Statuses sequence
Serial Number for each task ID Sequence starts from 1 and then comes in Sequence like 1.2.3... and so on
There are thousands of tasks in the table like from T1 .... Tn
See the Query Result below i need to write a query that returns data in that format
I have heard analytic function i.e. "Partition By clause" for this can be used but i don't know how to use that
Tasks
Query Result
I really appreciate experts help in this regard
Thanks
You can do this with analytic functions, but there is a trick. The idea is to look only at rows where approval_status is not null. You want the first group of sequential serial numbers in this group.
The group is identified by the difference between a sequence that enumerates all the rows and the existing serial number. To get the first, use dense_rank(). Finally, choose the first by looking for the ones with a rank equal to 1:
select t.*
from (select t.*, dense_rank(diff) over (partition by taskid) as grpnum
from (select t.*,
(row_number() over (partition by taskid order by serial_number) -
serial_number
) as diff
from tasks
where approval_status is not null
) t
) t
where grpnum = 1;
I have a complex query and which may return more than one record per group. There is a field that has a numeric sequential number. If in a group there is more than one record returned I just want the record with the highest sequential number.
I’ve tried using the SQL MAX function, but if I try to add more than one field it returns all records, instead of the one with the highest sequential field in that group.
I am trying to accomplish this in MS Access.
Edit: 4/5/11
Trying to create a table as an example of what I am trying to do
I have the following table:
tblItemTrans
ItemID(PK)
Eventseq(PK)
ItemTypeID
UserID
Eventseq is a number field that increments for each ItemID. (Don’t ask me why, that’s how the table was created.) Each ItemID can have one or many Evenseq’s. I only need the last record (max(Eventseq)) PER each ItemTypeID.
Hope this helps any.
SELECT A.*
FROM YourTable A
INNER JOIN (SELECT GroupColumn, MAX(SequentialColumn) MaxSeq
FROM YourTable
GROUP BY GroupColumn) B
ON A.GroupColumn = B.GroupColumn AND A.SequentialColumn = B.MaxSeq
If your SequentialNumber is an ID (unique across the table), then you could use
select *
from tbl
where seqnum in (
select max(seqnum) from tbl
group by groupcolumn)
If it is not, an alternative to Lamak's query is the Access domain function DMAX
select *
from tbl
where seqnum = DMAX("seqnum", "tbl", "groupcolumn='" & groupcolumn & "'")
Note: if the groupcolumn is a date, use # instead of single quotes ' in the above, if it is a numeric, remove the single quotes.