Show only latest login from inner join SQL statement - sql

I'm relatively new to SQL and I have the following query to get a list of logins since Jan 1st. I'm trying to only display each user's last login.
SELECT usrlogs.serverlogintime AS Login_Date,
usrlogs.usrname AS User_Name,
usrlogs.usrid AS User_ID,
usrlogs.usrlogid AS Log_ID,
users.status AS Active
FROM usrlogs
INNER JOIN users
ON usrid = uid
WHERE DATE_FORMAT (ServerLoginTime,'%Y-%m-%d') >= '2022-01-01' and status="0"
User_Log_ID increases by 1 with each new login to the server. Is there a way to only display each user's highest Log ID?

you can subselebt the higest logdid from the user and select the userlogs with that id
SELECT u.serverlogintime AS Login_Date,
u.usrname AS User_Name,
u.usrid AS User_ID,
u.usrlogid AS Log_ID,
users.status AS Active
FROM usrlogs u
INNEr JOIN (SELECT MAX(usrlogid) as usrlogid,usrid FROM usrlogs GROUP BY usrid) u1 ON u1.usrid = u.usrid AND u1.usrlogid = u.usrlogid
INNER JOIN users
ON u.usrid = users.uid
WHERE DATE_FORMAT (u.ServerLoginTime,'%Y-%m-%d') >= '2022-01-01' and status="0"

You need to use Row_Number() like this:
SELECT * FROM (
SELECT usrlogs.serverlogintime AS Login_Date,
usrlogs.usrname AS User_Name,
usrlogs.usrid AS User_ID,
usrlogs.usrlogid AS Log_ID,
users.status AS Active,
Row_number() over (partition by usrlogs.usrid order by usrlogs.usrlogid desc ) rw
FROM usrlogs
INNER JOIN users
ON usrid = uid
WHERE DATE_FORMAT (ServerLoginTime,'%Y-%m-%d') >= '2022-01-01' and status="0"
) t where t.rw=1

Related

Calculate rolling year totals in sql

I am gathering something that is essentially am "enrollment date" for users. The "enrollment date" is not stored in the database (for a reason too long to explain here), so I have to deduce it from the data. I then want to reuse this CTE in numerous places throughout another query to gather values such as "total orders 1 year before enrollment" and "total orders 1 year after enrollment".
I haven't gotten this code to run, as it's much more complex in my actual data set (this code is paraphrased from the actual code) and I have a feeling it's not the best way to do this. As you can see, my date conditionals are mostly just placeholders, but I think it should be obvious what I am trying to do.
That said, I think this would mostly work. My question is, is there a better way to do this? Additionally, could I combine the rolling year before and rolling year after into one table somehow? (maybe window functions)? This is part of a much bigger query, so the more consolidation I could do, the better it would seem.
For what it's worth, the subquery to derive the "enrollment date" is also more complex than shown here.
With enroll as (Select
user_id,
MIN(date) as e_date
FROM `orders` o
WHERE (subscribed = True)
group by user_id
)
Select*
from users
left join (select
user_id,
SUM(total_paid)
from orders where date > (select enroll.e_date where user_id = user_id) AND date < (select enroll.e_date where user_id = user_id + 365 days)
and order_type = 'special'
group by user_id
) as rolling_year_after on rolling_year_after.user_id = users.user_id
left join (select
user_id,
SUM(total_paid)
from orders where date < (select enroll.e_date where user_id = user_id) and date > (select enroll.e_date where user_id = user_id - 365 days)
and order_type = 'special'
group by user_id
) as rolling_year_before on rolling_year_before.user_id = users.user_id
Maybe something like this, not sure if its more performant, but looks a bit cleaner:
With enroll as (Select
user_id,
MIN(date) as e_date
FROM `orders` o
WHERE (subscribed = True)
group by user_id
)
, rolling_year as (
select
user_id,
SUM(CASE WHEN date between enroll.edate and enroll.edate + 365 days then (total_paid) else 0 end) as rolling_year_after,
SUM(CASE WHEN date between enroll.edate - 365 days and enroll.edate then (total_paid) else 0 end) as rolling_year_before
from orders
left join enroll
on order.user_id = enroll.user_id
where order_type = 'special'
group by user_id
)
Select *
from users
left join rolling_year
on users.user_id = rolling_year.user_id

Update value based on value from another record of same table

Here I have a sample table of a website visitors. As we can see, sometimes visitor don't provide their email. Also they may switch to different email addresses over period.
**
Original table:
**
I want to update this table with following requirements:
First time when a visitor provides an email, all his past visits will be tagged to that email
Also all his future visits will be tag to that email until he switches to another email.
**
Expected table after update:
**
I was wondering if there is a way of doing it in Redshift or T-Sql?
Thanks everyone!
In SQL Server or Redshift, you can use a subquery to calculate the email:
select t.*,
coalesce(email,
max(email) over (partition by visitor_id, grp),
max(case when activity_date = first_email_date then email end) over (partition by visitor_id)
)
from (select t.*,
min(case when email is not null then activity_date end) over
(partition by visitor_id order by activity_date rows between unbounded preceding and current row) as first_email_date,
count(email) over (partition by visitor_id order by activity_date between unbounded preceding and current row) as grp
from t
) t;
You can then use this in an update:
update t
set emai = tt.imputed_email
from (select t.,
coalesce(email,
max(email) over (partition by visitor_id, grp),
max(case when activity_date = first_email_date then email end) over (partition by visitor_id)
) as imputed_email
from (select t.,
min(case when email is not null then activity_date end) over
(partition by visitor_id order by activity_date) as first_email_date,
count(email) over (partition by visitor_id order by activity_date) as grp
from t
) t
) tt
where tt.visitor_id = t.visitor_id and tt.activity_date = t.activity_date and
t.email is null;
If we suppose that the name of the table is Visits and the primary key of that table is made of the columns Visitor_id and Activity_Date then you can do in T-SQL following:
using correlated subquery:
update a
set a.Email = coalesce(
-- select the email used previously
(
select top 1 Email from Visits
where Email is not null and Activity_Date < a.Activity_Date and Visitor_id = a.Visitor_id
order by Activity_Date desc
),
-- if there was no email used previously then select the email used next
(
select top 1 Email from Visits
where Email is not null and Activity_Date > a.Activity_Date and Visitor_id = a.Visitor_id
order by Activity_Date
)
)
from Visits a
where a.Email is null;
using window function to provide the ordering:
update v
set Email = vv.Email
from Visits v
join (
select
v.Visitor_id,
coalesce(a.Email, b.Email) as Email,
v.Activity_Date,
row_number() over (partition by v.Visitor_id, v.Activity_Date
order by a.Activity_Date desc, b.Activity_Date) as Row_num
from Visits v
-- previous visits with email
left join Visits a
on a.Visitor_id = v.Visitor_id
and a.Email is not null
and a.Activity_Date < v.Activity_Date
-- next visits with email if there are no previous visits
left join Visits b
on b.Visitor_id = v.Visitor_id
and b.Email is not null
and b.Activity_Date > v.Activity_Date
and a.Visitor_id is null
where v.Email is null
) vv
on vv.Visitor_id = v.Visitor_id
and vv.Activity_Date = v.Activity_Date
where
vv.Row_num = 1;
For each visitor_id you can update the null email value with the previus non-null value. In case there is none, you will use the next non-null value.You can get those values as follows:
select
v.*, v_prev.email prev_email, v_next.email next_email
from
visits v
left join visits v_prev on v.visitor_id = v_prev.visitor_id
and v_prev.activity_date = (select max(v2.activity_date) from visits v2 where v2.visitor_id = v.visitor_id and v2.activity_date < v.activity_date and v2.email is not null)
left join visits v_next on v.visitor_id = v_next.visitor_id
and v_next.activity_date = (select min(v2.activity_date) from visits v2 where v2.visitor_id = v.visitor_id and v2.activity_date > v.activity_date and v2.email is not null)
where
v.email is null

How to get 1 record on the basis of two column values in a single table?

The query is
select distinct b.UserID , cast(b.entrytime as date) ,count(*) as UserCount
from [dbo].[person] as a
join [dbo].[personcookie] as b
on a.UserID = b.UserID
where cast (b.entrytime as date) >= '08/21/2020'
and cast (b.leavetime as date) <= '08/27/2020' and a.distinction = 99
group by cast(b.entrytime as date), b.UserID
If same UserID has count more than 1 for same date, It should consider as 1. Now as it is shown in the image that USERID 10 has count 1 for 2020-08-26 and USERID 10 has count 2 for '2020-08-27'. It should show that user ID 10 has total count 2 for `2020-08-26 and 2020-08-27' (because for 2020-08-27 the count should be 1) as per the requirement.
I have added the image of tables and what output i want
It seems you want one result row per user, so group by user, not by user and date. You want to count dates per user, but each day only once. This is a distinct count.
select
p.userid,
count(distinct cast(pc.entrytime as date)) as date_count
from dbo.person as p
join dbo.personcookie as pc on pc.userid = p.userid
where p.distinction = 99
and pc.entrytime >= '2020-08-08'
and pc.leavetime < '2020-08-28'
group by p.userid
order by p.userid;
You seem to want dense_rank():
select p.UserID, cast(pc.entrytime as date),
dense_rank() over (partition by p.userID order by min(pc.entrytime)) as usercount
from [dbo].[person] p join
[dbo].[personcookie] pc
on pc.UserID = p.UserID
where cast(pc.entrytime as date) >= '2020-08-21' and
cast(pc.leavetime as date) <= '2020-08-27'
group by cast(pc.entrytime as date), p.UserID;
Notes:
The only "real" change is using dense_rank(), which enumerates the days for a given user.
Use meaningful table aliases, rather than arbitrary letters.
Use standard date/time constants. In SQL Server, that is either YYYYMMDD or YYYY-MM-DD.

How to calculate running sums with append-only rows

I have a table where rows are never mutated but only inserted; they are immutable records. It has the following fields:
id: int
user_id: int
created: datetime
is_cool: boolean
likes_fruits: boolean
An object is tied to a user, and the "current" object for a given user is the one that has the latest created date. E.g. if I want to update is_cool for a user, I'd append a record with a new created timestamp and is_cool=true.
I want to calculate how many users are is_cool at the end of each day. I.e. I'd like the output table to have the columns:
day: some kind of date_trunc('day', created)
cool_users_count: number of users that have is_cool at the end of this day.
What SQL query can i write that does this? FWIW I'm using Presto (or Redshift if need to).
Note that there are other columns, e.g. likes_fruits, which means a record where is_cool is false does not mean is_cool was just changed to false - it could have been false for a while.
This is what procedural pseudo-code would look like to represent what I'd want to do in SQL:
// rows = ...
min_date = min([row.created for row in rows])
max_date = max([row.created for row in rows])
counts_by_day = {}
for date in range(min_date, max_date):
rows_up_until_date = [row for row in rows if row.created <= date]
latest_row_by_user = rows_up_until_date.reduce(
{},
(acc, row) => acc[row.user_id] = row,
)
counts_by_day[date] = latest_row_by_user.filter(row => row.is_cool).length
You can do this using jus a query .. try using a sum on boolend and group by
select date(created), sum(is_cool)
from my_table
group by date(created)
or if you need the number of users
select t.date_created, count(*) num_user
from (
select distinct date(created) date_created, user_id
from my_table
where is_cool = TRUE
) t
group by t.date_created
or if need the last value for is_cool
select date(max_date), sum(is_cool)
from (
select t.user_id, t.max_date, m.is_cool, m.user_id
from my_table m
inner join (
select max(date_created) max_date, user_id
from my_table
group by user_id, date(date_created)
) t on t.max_date = m.date_created
and t.user_id = m.user_id
where m.is_cool = TRUE
) t2
group by date(max_date)
A correlated subquery might be the simplest solution. The following gets the value of is_cool for each user on each date:
select u.user_id, d.date,
(select t.is_cool
from t
where t.user_id = u.user_id and
t.created < dateadd(day, 1, d.date)
order by t.created desc
limit 1
) as is_cool
from (select distinct date(created) as date
from t
) d cross join
(select distinct user_id
from t
) u ;
Then aggregate:
select date, sum(is_cool)
from (select u.user_id, d.date,
(select t.is_cool
from t
where t.user_id = u.user_id and
t.created < dateadd(day, 1, d.date)
order by t.created desc
limit 1
) as is_cool
from (select distinct date(created) as date
from t
) d cross join
(select distinct user_id
from t
) u
) ud
group by date;

What's the proper SQL query to find a 'status change' before given date?

I have a table of logged 'status changes'. I need to find the latest status change for a user, and if it was a) a certain 'type' of status change (s.new_status_id), and b) greater than 7 days old (s.change_date), then include it in the results. My current query sometimes returns the second-to-latest status change for a given user, which I don't want -- I only want to evaluate the last one.
How can I modify this query so that it will only include a record if it is the most recent status change for that user?
Query
SELECT DISTINCT ON (s.applicant_id) s.applicant_id, a.full_name, a.email_address, u.first_name, s.new_status_id, s.change_date, a.applied_class
FROM automated_responses_statuschangelogs s
INNER JOIN application_app a on (a.id = s.applicant_id)
INNER JOIN accounts_siuser u on (s.person_who_modified_id = u.id)
WHERE now() - s.change_date > interval '7' day
AND s.new_status_id IN
(SELECT current_status
FROM application_status
WHERE status_phase_id = 'In The Flow'
)
ORDER BY s.applicant_id, s.change_date DESC, s.new_status_id, s.person_who_modified_id;
You can use row_number() to filter one entry per applicant:
select *
from (
select row_number() over (partition by applicant_id
order by change_date desc) rn
, *
from automated_responses_statuschangelogs
) as lc
join application_app a
on a.id = lc.applicant_id
join accounts_siuser u
on lc.person_who_modified_id = u.id
join application_status stat
on lc.new_status_id = stat.current_status
where lc.rn = 1
and stat.status_phase_id = 'In The Flow'
and lc.change_date < now() - interval '7' day