We have table with 3 columns- Url of Page Visited, User Session ID and Datetime.
Based on this information we have generate result with 2 columns - Date (unique) and Bounce Rate.
It is very clear that we need to look for single occurrences of session id, if there are 2 entries for same session id it means the user hitted the another page and didn't bounced but one entry means it bounced.
I can not write a sql query for this. I tried grouping data by session id and date but couldn't get the result in required format.
Can anyone do this?
If you want the number of sessions with only one page per day, you can use aggregation:
select dte,
avg( (num_pages = 1)::int ) as bounce_rate
from (select sessionid, min(datetime)::date as dte, count(*) as num_pages
from t
group by sessionid
) t
group by dte;
Related
Been stuck on this issue and could really use a suggestion or help.
What I have in a table is basic user flow on a website. For every Session ID, there's a page visited from start (lands on homepage) to finish (purchase). This has been ordered by timestamp to get a count of pages visited during this process. This 'page count' has also been partitioned by Session ID to go back to 1 every time the ID changes.
What I need to do now is assign a step count (highlighted is what I'm trying to achieve). This should assign a similar count but doesn't continue counting at duplicate steps (ie, someone visited multiple product pages - it's multiple pages but still only one 'product view' step.
You'd think this would be done using a dense rank, partitioned by session id - but that's where I get stuck. You can't order on page count because that'll assign a unique number to each step count. You can't order by Step because that orders it alphabetically.
What could I do to achieve this?
Screenshot of desired outcome:
Many thanks!
Use lag to see if two values are the same then a cumulative sum:
select t.*,
sum(case when prev_cs = custom_step then 0 else 1 end) over (partition by session_id order by timestamp) as steps_count
from (select t.*,
lag(custom_step) over (partition by session_id order by timestamp) as prev_cs
from t
) t
Below is for BigQuery Standard SQL
#standardSQL
SELECT * EXCEPT(flag),
COUNTIF(IFNULL(flag, TRUE)) OVER(PARTITION BY session_id ORDER BY timestamp) AS steps_count
FROM (
SELECT *,
custom_step != LAG(custom_step) OVER(PARTITION BY session_id ORDER BY timestamp) AS flag
FROM `project.dataset.table`
)
-- ORDER BY timestamp
I have been trying to write a query to perfect this instance but cant seem to do the trick because I am still receiving duplicated. Hoping I can get help how to fix this issue.
SELECT DISTINCT
1.Client
1.ID
1.Thing
1.Status
MIN(1.StatusDate) as 'statdate'
FROM
SAMPLE 1
WHERE
[]
GROUP BY
1.Client
1.ID
1.Thing
1.status
My output is as follows
Client Id Thing Status Statdate
CompanyA 123 Thing1 Approved 12/9/2019
CompanyA 123 Thing1 Denied 12/6/2019
So although the query is doing what I asked and showing the mininmum status date per status, I want only the first status date. I have about 30k rows to filter through so whatever does not run overload the query and have it not run. Any help would be appreciated
Use window functions:
SELECT s.*
FROM (SELECT s.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY statdate) as seqnum
FROM SAMPLE s
WHERE []
) s
WHERE seqnum = 1;
This returns the first row for each id.
Use whichever of these you feel more comfortable with/understand:
SELECT
*
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY statusdate) as rn
FROM sample
WHERE ...
) x
WHERE rn = 1
The way that one works is to number all rows sequentially in order of StatusDate, restarting the numbering from 1 every time ID changes. If you thus collect all the number 1's togetyher you have your set of "first records"
Or can coordinate a MIN:
SELECT
*
FROM
sample s
INNER JOIN
(SELECT ID, MIN(statusDate) as minDate FROM sample WHERE ... GROUP BY ID) mins
ON s.ID = mins.ID and s.StatusDate = mins.MinDate
WHERE
...
This one prepares a list of all the ID and the min date, then joins it back to the main table. You thus get all the data back that was lost during the grouping operation; you cannot simultaneously "keep data" and "throw away data" during a group; if you group by more than just ID, you get more groups (as you have found). If you only group by ID you lose the other columns. There isn't any way to say "GROUP BY id, AND take the MIN date, AND also take all the other data from the same row as the min date" without doing a "group by id, take min date, then join this data set back to the main dataset to get the other data for that min date". If you try and do it all in a single grouping you'll fail because you either have to group by more columns, or use aggregating functions for the other data in the SELECT, which mixes your data up; when groups are done, the concept of "other data from the same row" is gone
Be aware that this can return duplicate rows if two records have identical min dates. The ROW_NUMBER form doesn't return duplicated records but if two records have the same minimum StatusDate then which one you'll get is random. To force a specific one, ORDER BY more stuff so you can be sure which will end up with 1
I'm using postgres to run some analytics on user activity. I have a table of all requests(pageviews) made by every user and the timestamp of the request, and I'm trying to find the number of distinct sessions for every user. For the sake of simplicity, I'm considering every set of requests an hour or more apart from others as a distinct session. The data looks something like this:
id| request_time| user_id
1 2014-01-12 08:57:16.725533 1233
2 2014-01-12 08:57:20.944193 1234
3 2014-01-12 09:15:59.713456 1233
4 2014-01-12 10:58:59.713456 1234
How can I write a query to get the number of sessions per user?
To start a new session after every gap >= 1 hour:
SELECT user_id, count(*) AS distinct_sessions
FROM (
SELECT user_id
,(lag(request_time, 1, '-infinity') OVER (PARTITION BY user_id
ORDER BY request_time)
<= request_time - '1h'::interval) AS step -- start new session
FROM tbl
) sub
WHERE step
GROUP BY user_id
ORDER BY user_id;
Assuming request_time NOT NULL.
Explain:
In subquery sub, check for every row if a new session begins. Using the third parameter of lag() to provide the default -infinity, which is lower than any timestamp and therefore always starts a new session for the first row.
In the outer query count how many times new sessions started. Eliminate step = FALSE and count per user.
Alternative interpretation
If you really wanted to count hours where at least one request happened (I don't think you do, but another answer assumes as much), you would:
SELECT user_id
, count(DISTINCT date_trunc('hour', request_time)) AS hours_with_req
FROM tbl
GROUP BY 1
ORDER BY 1;
I have a table SIGNUPS, where I register all signups to a specific event. Now, I would like to get all people who signed up to an event, with an extra column STATUS telling if the user is actually accepted (STATUS = "OK") or if it is in a waiting list (STATUS="WL"). I tried something like this
SELECT *, IDUSER IN (SELECT IDUSER FROM SIGNUPS ORDER BY DATE ASC LIMIT 10)
as STATUS from SIGNUPS WHERE IDEVENT = 1
This should return STATUS 1 for the first 10 users who signed up, and 0 for all other ones. Unluckily, I get a Mysql error telling me that LIMIT in subqueries is not yet supported.
Could you please suggest another way to get the same information?
Thanks
Something like the following will get what you need - although I haven't tested it against some sample tables. The subqueries find the date above which the last ten signups occur, which is then used to comapre to the date of the current row.
select
s.*,
s.DATE > d.min_date_10 AS STATUS
from SIGNUPS s
join (
select MIN(DATE) AS min_date_10 from (
select DATE from SIGNUPS order by DATE asc LIMIT 10
) a
) d
WHERE IDEVENT = 1
Say I have a voting table, where users can vote on values up, down, or flat.
Say a user gets a point each time the corrcet projection is made.
At the end of each week I want to display some statistics.
Something like:
SELECT user_id, sum( user_points ) as sum_points FROM voting_results
WHERE voting_date > ('2009-09-18' - INTERVAL 1 WEEK)
GROUP BY user_id
ORDER BY sum_points DESC
Fine. This will get me a nice list where the "best guessing" user comes up first.
Here's my question:
How do I - in the same query - go about obtaining how many times each user has voted during the given timeperiod?
Put another way: I want a count - per row - that need to contain the number of rows found with the user_id within the above mentioned query.
Any suggestions?
Thanks.
Just add COUNT(*):
SELECT user_id,
SUM(user_points) as sum_points,
COUNT(*) AS num_votes
FROM voting_results
WHERE voting_date > ('2009-09-18' - INTERVAL 1 WEEK)
GROUP BY
user_id
ORDER BY
sum_points DESC