How do I get the average date interval of a column in SQL? - sql

I have a table of user interactions on a web site and I need to calculate the average time between interactions of each user. To make it more simple to understand, here's some records of the table:
Where the first column is the user id and the second is the interaction time. The results that I need is the average time between interactions of each user. Example:
The user 12345 average interaction interval is 1 day
I've already tried to use window functions, but i couldn't get the average because PostgreSQL doesn't let me use GROUP BY or AVG on window functions, I could get the intervals using the following command, but couldn't group it based on the user id.
SELECT INTERACTION_DATE - LAG(INTERACTION_DATE ) OVER (ORDER BY INTERACTION_DATE )
So, I decided to create my own custom function and after that, create a custom aggregate function to do this, and use this function on a group by clause:
CREATE OR REPLACE FUNCTION DATE_INTERVAL(TIMESTAMP)
RETURNS TABLE (USER_INTERVALS INTERVAL)
AS $$
SELECT $1 - LAG($1) OVER (ORDER BY $1)
$$
LANGUAGE SQL
IMMUTABLE;
But this function only return several rows with one column with null value.
Is there a better way to do this?

You need to first calculate the difference between the interactions for each row (and user), then you can calculate the average on that:
select user_id, avg(interaction_time)
from (
select user_id,
interaction_date - lag(interaction_date) over (partition by user_id order by interaction_date) as interaction_time
from the_table
) t
group by user_id;

Encapsule your first query then compute the average:
SELECT AVG(InteractionTime) FROM (
SELECT INTERACTION_DATE - LAG(INTERACTION_DATE ) OVER (ORDER BY INTERACTION_DATE ) AS InteractionTime
)

Related

Access 10th through 70th element in STRUCT

I have 3 fields: username, tracking_id, timestamp. One user will have multiple rows (some have more, some have less) with different tracking ids and timestamps for each action he has taken on my website. I want to group by the username and get the tracking ids of that user's 10th through 70th action. I use standard SQL on BigQuery.
First problem is, I can't find syntax to access a range in the STRUCT (only a single row or using a limit to get the first/last 70 rows for example). Then, I can image after managing to access a range, there could be an issue with the index being out of bounds because some users might not have 70 or more actions.
SELECT
username,
ARRAY_AGG(STRUCT(tracking_id,
timestamp)
ORDER BY
timestamp
)[OFFSET (9 to 69)] #??????
FROM
table
The result should be a table with the same 3 fields: username, tracking_id, timestamp, but instead of containing ALL the user's rows, it should only contain each users 10th to 70th row.
Below is for BigQuery Standard SQL
#standardSQL
SELECT username,
ARRAY_AGG(STRUCT(tracking_id, `timestamp`) ORDER BY `timestamp`) AS selected_actions
FROM (
SELECT * EXCEPT(pos) FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY username ORDER BY `timestamp`) pos
FROM `project.dataset.table`
)
WHERE pos BETWEEN 10 AND 70
)
GROUP BY username

How to implement "group-by" sampling in Hive?

Given a Hive table:
create table mock
(user string,
url string
);
How to sample a certain percentage of url (say 50%) or certain number of url for each user?
There is a built-in query to extract samples from a table.
SELECT * FROM mock TABLESAMPLE(50 PERCENT)
Here is an alternative solution using row_number(). First number each rows for each user
with numbered as (
SELECT user, url, row_number() OVER (PARTITION BY user ORDER BY user) as rn FROM mock
)
Then just either select the odd or even rows using pmod to get 50% sample
SELECT user, url FROM numbered where pmod(rn,2) = 0

PostgreSQL window function: serial # of event by day among many days

I have this table with just one column of type datetime. And I have many events per day.
To get all the events it is just SELECT date FROM table;
How can I add a serial number to each row so that the first row in each day is 1, the second is 2, and so on, reseting the serial count on the next day?
I am looking for a solution using PostgreSQL window functions.
Create dummy date first (should haven been part of question):
CREATE TABLE "table"
(
date timestamp with time zone
);
insert into "table"(date)
select now()+(x*23||'minutes')::interval
from generate_series(1, 100) as x;
Then query all rows including a counter:
select date, row_number() over (partition by date_trunc('day', date))
from "table"

Write Oracle SQL query to fetch from Tasks table top Approval Statuses that appear after some first null value

Write Oracle SQL query to fetch from Tasks table top Approval Statuses that appear after some first null value in the Approval_Status Column and then Approval Status sequence and then some null values
Facts
I only need the top Approval Statuses sequence
Serial Number for each task ID Sequence starts from 1 and then comes in Sequence like 1.2.3... and so on
There are thousands of tasks in the table like from T1 .... Tn
See the Query Result below i need to write a query that returns data in that format
I have heard analytic function i.e. "Partition By clause" for this can be used but i don't know how to use that
Tasks
Query Result
I really appreciate experts help in this regard
Thanks
You can do this with analytic functions, but there is a trick. The idea is to look only at rows where approval_status is not null. You want the first group of sequential serial numbers in this group.
The group is identified by the difference between a sequence that enumerates all the rows and the existing serial number. To get the first, use dense_rank(). Finally, choose the first by looking for the ones with a rank equal to 1:
select t.*
from (select t.*, dense_rank(diff) over (partition by taskid) as grpnum
from (select t.*,
(row_number() over (partition by taskid order by serial_number) -
serial_number
) as diff
from tasks
where approval_status is not null
) t
) t
where grpnum = 1;

SQL Max Function per group

I have a complex query and which may return more than one record per group. There is a field that has a numeric sequential number. If in a group there is more than one record returned I just want the record with the highest sequential number.
I’ve tried using the SQL MAX function, but if I try to add more than one field it returns all records, instead of the one with the highest sequential field in that group.
I am trying to accomplish this in MS Access.
Edit: 4/5/11
Trying to create a table as an example of what I am trying to do
I have the following table:
tblItemTrans
ItemID(PK)
Eventseq(PK)
ItemTypeID
UserID
Eventseq is a number field that increments for each ItemID. (Don’t ask me why, that’s how the table was created.) Each ItemID can have one or many Evenseq’s. I only need the last record (max(Eventseq)) PER each ItemTypeID.
Hope this helps any.
SELECT A.*
FROM YourTable A
INNER JOIN (SELECT GroupColumn, MAX(SequentialColumn) MaxSeq
FROM YourTable
GROUP BY GroupColumn) B
ON A.GroupColumn = B.GroupColumn AND A.SequentialColumn = B.MaxSeq
If your SequentialNumber is an ID (unique across the table), then you could use
select *
from tbl
where seqnum in (
select max(seqnum) from tbl
group by groupcolumn)
If it is not, an alternative to Lamak's query is the Access domain function DMAX
select *
from tbl
where seqnum = DMAX("seqnum", "tbl", "groupcolumn='" & groupcolumn & "'")
Note: if the groupcolumn is a date, use # instead of single quotes ' in the above, if it is a numeric, remove the single quotes.