How to get timespans between entities from timed log table? - sql

I have a log table to store user's login/logout logs. My goal is to calculate how many times each user logged in and logined-times for each login.
I'm working on PostgreSQL database. Log table has log_id(PK), user_id(FK), login_state, created_time. login_state column is enum type and its value is either 'login' or 'logout'.
For now, I used self join on Log table like below.
SELECT A.log_id, A.user_id, A.login_state, A.created_time, B.log_id, B.login_state, B.created_time, (B.created_time-A.created_time) elapsedtime
FROM logtable A INNER JOIN logtable B ON (A.login_state='login' AND B.login_state='logout')
WHERE (A.user_id=B.user_id) AND (A.created_time<=B.created_time);
I got some right records but there are also wrong records.
I think maybe join couldn't be a solution. For each login entity, only one logout entity should be matched but I couldn't write the right query statement for this.
The best result could be a collection of login-logout pairs and it's elapsed time for each user.
Need some helps. Thanks.
============== Add some sample data and expected results =========
Sample Log Table
Expect Results
DB Fiddler for test
https://www.db-fiddle.com/f/vz6EyKKTg6PWs1X4HbTspB/0

demo:db<>fiddle
You can use the lead() window function to get the next value into the current record.
SELECT
*,
logout_time - created_time AS elapsed_time
FROM (
SELECT
*,
lead(created_time) OVER (PARTITION BY user_id ORDER BY created_time) as logout_time
FROM logtable
) s
WHERE login_state = 'login'
ORDER BY created_time

Related

How to select users for whom one type of event occurred before another in PostgreSQL?

this is an example of data structure in my SQL table
In fact I have many users in my table and some of them have incorrect order of steps (user number 2 in the picture). How can I select all such users? The logic is to select all users that have date of sign_in earlier than date of registration? I suppose regular WHERE clause won't work here. Maybe there is a special function for such cases?
I can see two approaches to solve the problem. For reference this is how I imagine the table might look like
create table users (
user_id int,
action text,
date decimal
);
Use a self join. In this we're basically fetching the records with 'registration' action and adding a self join on matching user_id and 'sign_in' action. Because of the join, the data for each of the action is now available in the same row so this allows you to compare in the where clause
select u1.*
from users u1
join users u2 on u1.user_id = u2.user_id and u2.action = 'sign_in'
where u1.action = 'registration' and u2.date < u1.date;
Use crosstab* function of postgres. This allows you to transpose rows into columns hence gives the ability to compare in the where clause. Personally I think this is more elegant and extensive in the sense that it'll allow you to make other comparisons as well if needed without adding another join. Looking at the cost using "explain", this comes out to be more efficient as well.
SELECT *
FROM crosstab(
'select user_id, action, date
from users
order by user_id, action'
) AS ct(user_id int, del_account decimal, registration decimal, sign_in decimal)
where sign_in < registration;
*Note: In order to use crosstab however you may need superuser access to the database to create the extension. You can do so by running the following query only once
CREATE EXTENSION IF NOT EXISTS tablefunc;
Hope this helps. Let me know in the comments if there's any confusion
Your question is a bit vague yet the problem is generic enough.
First let's make your actions comparable and sortable in the right sequence, for example '1.registration', '2.sign_in', '3.del_account' instead of 'registration', 'sign_in', 'del_account'. Even better, use action codes, 2 for sign_in, 1 for registration etc.
Then you can detect misplaced actions and select the list of distinct user_id-s who did them.
select distinct user_id from
(
select user_id,
action > lead(action) over (partition by user_id order by "date") as misplaced
from the_table
) as t
where misplaced;
This approach would work for ay number of action steps, not only 3.
If you create a case statement for the action column you can get date of sign_in earlier than date of registration
https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=1e112d51825f5d3185e445d97d4e9c78
select * from (
select ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY date ) as udid,case when action='registration' then 1
when action='sign_in' then 2
when action='delete' then 3
ELSE 4
end as stsord,*
from duptuser
) as drt where stsord!=udid

BigQuery Join Using Most Recent Row

I have seen variations of this question but have been searching StackOverflow for almost a week now trying various solutions and still struggling with this. Really appreciate you taking the time to consider my question.
I am working on a research project in GCP using BigQuery. I have a table result of ~100 million rows of events where there is a session_id column that relates to the session that the event originated from. I would like to join this with another table status of about 40 million rows that has that same session_id and tracks the status of those sessions. Both tables have a time column. In the result table, this is the time of the event. In the status table this is the time of any status changes. I want to join the rows in the result table with the corresponding row in the status table for the most recent state of the session up to or before the time of the event using the session ID. The result would be that each row in the result table would have the corresponding information about the state of the session when the event occurred.
How can I achieve this? Any way to do it that won't be really inefficient? Thank you so much for your help!
You may be able to use a left join:
select r.*, s.status -- choose whatever columns you want
from result r left join
(select s.*,
lead(time) over (partition by session_id order by time) as next_time
from status s
) s
on r.session_id = s.session_id and
r.time <= s.time and
(r.time > s.next_time or s.next_time is null)

Select latest and earliest times within a time group and a pivot statement

I have attandance data that contains a username, time, and status (IN or OUT). I want to show attendance data that contains a name, and the check in/out times. I expect a person to check in and out no more than twice a day. The data looks like this:
As you can see, my problem is that one person can have multiple data entries in different seconds for the same login attempt. This is because I get data from a fingerprint attendace scanner, and the machine in some cases makes multiple entries, sometimes just within 5-10 seconds. I want to select the data to be like this:
How can I identify the proper time for the login attempt, and then select the data with a pivot?
First, you need to normalize your data by removing the duplicate entries. In your situation, that's a challenge because the duplicated data isn't easily identified as a duplicate. You can make some assumptions though. Below, I assume that no one will make multiple login attempts in a two minute window. You can do this by first using a Common Table Expression (CTE, using the WITH clause).
Within the CTE, you can use the LAG function. Essentially what this code is saying is "for each partition of user and entry type, if the previous value was within 2 minutes of this value, then put a number, otherwise put null." I chose null as the flag that will keep the value because LAG of the first entry is going to be null. So, your CTE will just return a table of entry events (ID) that were distinct attempts.
Now, you prepare another CTE that a PIVOT will pull from that has everything from your table, but only for the entry IDs you cared about. The PIVOT is going to look over the MIN/MAX of your IN/OUT times.
WITH UNIQUE_LOGINS AS (
SELECT ID FROM LOGIN_TABLE
WHERE CASE WHEN LAG(TIME, 1, 0) OVER (PARTITION BY USERNAME, STATUS ORDER BY TIME)
+ (2/60/24) < TIME THEN NULL ELSE 1 END IS NULL ), -- Times within 2 minutes
TEMP_FOR_PIVOT AS (
SELECT USERNAME, TIME, STATUS FROM LOGIN_TABLE WHERE ID IN (SELECT ID FROM UNIQUE_LOGINS)
)
SELECT * FROM TEMP_FOR_PIVOT
PIVOT (
MIN(TIME), MAX(TIME) FOR STATUS IN ('IN', 'OUT')
)
From there, if you need to rearrange or rename your columns, then you can just put that last SELECT into yet another CTE and then select your values from it. There is some more about PIVOT here: Rotate/pivot table with aggregation in Oracle

Rails SQL Query return based on not being in join table

I have an Event model. An event can have attendees through the UserEvent join table.
I want a query of events where a specific user id is not an attendee.
I tried Event.joins(:user_events).where('user_events.user_id != ?', user_id) but that clearly doesn't work because if another user attends the Event, it will still list the event because the other user's id clearly isn't equal to user_id. How would I solve this?
You could do a pluck and a where.not with an IN query:
Event.where.not(id: UserEvent.where(user_id: some_user_id).pluck(:event_id))
This returns an ActiveRecord::Relation which you can paginate, sort, etc.
One way to get it down to one query if you don't mind some SQL:
Event.where.not("id in (select event_id from user_events where user_id = #{user_id})")
And another single query alternative, this time using a WHERE NOT EXISTS:
Event.where.not("exists (select event_id from user_events where user_id = #{user_id} and user_events.event_id = events.id)")

SQL Query to return both distinct and total logins from a login table in sql2005

I have a table Logins { Id, AccessTime }, I'd like to write a query that returns 3 columns, Total Logins during a time period, Unique Logins for a time period, and the Id of the user.
I know I could do this with two passes and a join, is there a better way to do it in a single pass?
Thanks,
~Prescott
Edit: After considering this futher, if they they show up at all, then they are a unique login, thus I can just grab count(*), userId group by userid.
Sorry for the sillyness.
Poorly thought out before asking this question, essentially, if they have a record in the date period for Count(*), then their unique login is 1..