Having a question on how to write a self-join query.
The Online Session Table holds all user activities. Each Activity has a State ID, TimeStap to record the User login time.
It's like:
example:
State TimeStamp User
X 1300 A
Y 1700 A
X 0700 B
Z 1500 B
Y 1600 B
X 2100 C
A little Explanation: In the above table, User A logged in State X on 1300, then logged in State Y on 1700, so the User A spend 0400(assume it's 4 hours) in State X.
The Same logic applied to User B.
Then User C, since it never change sate, so we use current time - login time stamp of X.
The output should look like:
State Time User
X 0400(or 4) A
X 0800(or 8) B
Z 0100(or 1) B
X result of Now-2100 C
.......
Edit: Just make the problem clearer.Now let's assume it's in SQL Server DMBS,but it's ok to use other DBMS.
And input timestamps are store as default datetime format like YYYY-MM-DD HH:MM:SS.
You didn't mention which DBMS you're using, so I'm writing this how I'd do it in MS SQL Server (TSQL). You'll need access to the LAG function, which is not universal.
What LAG does is allow you to compare values from a previous row, based on some shared column value, in this case User. This code catches those comparisons in the prev_ fields. I'm using count() to differentiate users with more than one line from users with only one line. The single-line users are handled separately after the union all.
You'll notice that I'm not using your field names until the final output step. This is because State, Timestamp and User are all reserved words, i.e. words that do something in SQL code. I strongly recommend you use field names that are not reserved words.
This code does have a major limitation; it doesn't work for the now-minus-time portion if it's not the same day. So in your example, it would have to be between 21:01 and 23:59 the same day for it to work. If you wanted to do this robustly you'd use datetime format for your times, which would make this a lot easier and eliminate the limitation. But this answer is for your data, so:
SELECT
b.prev_state AS [State]
,b.Online_time - b.prev_time AS [Time]
,b.U_ID as [User]
FROM
(SELECT
t.Online_state
,t.U_ID
,t.Online_time
,LAG(t.online_time) OVER (PARTITION BY t.U_ID ORDER BY t.U_ID, t.online_time) AS prev_time
,LAG(t.online_state) OVER (PARTITION BY t.U_ID ORDER BY t.U_ID, t.online_time) AS prev_state
FROM online_t AS t
inner join
(SELECT
U_ID,
count(U_ID) AS tot
FROM online_t
GROUP BY U_ID) AS a
on t.U_ID = a.U_ID
WHERE tot > 1) AS b
WHERE prev_time is not null
union all
SELECT
t.Online_state AS [State]
,concat(datepart(hh,getdate()),'00') - t.Online_time AS [Time]
,t.U_ID AS [USER]
FROM online_t AS t
inner join
(SELECT
U_ID
,count(U_ID) as tot
FROM online_t
GROUP BY U_ID) as a
on t.U_ID = a.U_ID
WHERE tot = 1
I have a solution using Oracle analytical functions, which may not be available to you. I'm also using timestamps as oracle varchars.
I'm using LEAD() in a subquery to return the "next user" and the "next time".
Then using a CASE statement to handle the different scenarios.
SELECT M.THESTATE,
CASE
WHEN M.USERID = M2.NEXT_USER THEN M2.NEXT_TIME-M.THETIME
WHEN M.USERID <> M2.NEXT_USER THEN NULL
ELSE M.THETIME-0 END AS TOTALTIME
,M.USERID
FROM MYTEST M
JOIN
(
SELECT USERID, THESTATE, THETIME
,LEAD(THETIME) OVER (ORDER BY USERID, THETIME) AS NEXT_TIME
,LEAD(USERID) OVER (ORDER BY USERID, THETIME) AS NEXT_USER
FROM MYTEST
ORDER BY USERID
) M2 ON M2.USERID = M.USERID AND M2.THESTATE=M.THESTATE
WHERE
CASE
WHEN M.USERID = M2.NEXT_USER THEN M2.NEXT_TIME-M.THETIME
WHEN M.USERID <> M2.NEXT_USER THEN NULL
ELSE M.THETIME-0 END
IS NOT NULL;
Including your input in a WITH clause (I use the TIMESTAMP type for your "timestamp"; and some databases don't like if you use reserved words ("user", "timestamp") for column names), try this:
WITH
-- input, don't use in query
input(state,"timestamp","user") AS (
SELECT 'X',TIMESTAMP '2017-03-15 13:00:00','A'
UNION ALL SELECT 'Y',TIMESTAMP '2017-03-15 17:00:00','A'
UNION ALL SELECT 'X',TIMESTAMP '2017-03-15 07:00:00','B'
UNION ALL SELECT 'Z',TIMESTAMP '2017-03-15 15:00:00','B'
UNION ALL SELECT 'Y',TIMESTAMP '2017-03-15 16:00:00','B'
UNION ALL SELECT 'X',TIMESTAMP '2017-03-15 21:00:00','C'
)
,
-- start real query here, comma above would
-- be the WITH keyword
state_duration_user AS (
SELECT
state
, IFNULL(
LEAD("timestamp") OVER(ORDER BY "timestamp")
, CURRENT_TIMESTAMP
) - "timestamp"
AS "time"
, "user"
FROM input
)
SELECT
state
, CAST(SUM("time") AS TIME(0)) AS "time"
, "user"
FROM state_duration_user
GROUP BY
state
, "user"
;
state|time |user
Y |04:00:00|A
Y |01:00:00|B
Z |01:00:00|B
X |02:00:00|A
X |06:00:00|B
X |07:59:19|C
Related
I have two datasets hosted in Snowflake with social media follower counts by day. The main table we will be using going forward (follower_counts) shows follower counts by day:
This table is live as of 4/4/2020 and will be updated daily. Unfortunately, I am unable to get historical data in this format. Instead, I have a table with historical data (follower_gains) that shows net follower gains by day for several accounts:
Ideally - I want to take the follower_count value from the minimum date in the current table (follower_counts) and subtract the sum of gains (organic + paid gains) for each day, until the minimum date of the follower_gains table, to fill in the follower_count historically. In addition, there are several accounts with data in these tables, so it would need to be grouped by account. It should look like this:
I've only gotten as far as unioning these two tables together, but don't even know where to start with looping through these rows:
WITH a AS (
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
total_followers_count,
null AS paid_follower_gain,
null AS organic_follower_gain,
account_name,
last_update
FROM follower_counts
UNION ALL
SELECT
account_id,
date,
organizational_entity,
organizational_entity_type,
vanity_name,
localized_name,
localized_website,
organization_type,
null AS total_followers_count,
organic_follower_gain,
paid_follower_gain,
account_name,
last_update
FROM follower_gains)
SELECT
a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.total_followers_count,
a.organic_follower_gain,
a.paid_follower_gain,
a.account_name,
a.last_update
FROM a
ORDER BY date desc LIMIT 100
UPDATE: Changed union to union all and added not exists to remove duplicates. Made changes per the comments.
NOTE: Please make sure you don't post images of the tables. It's difficult to recreate your scenario to write a correct query. Test this solution and update so that I can make modifications if necessary.
You don't loop through in SQL because its not a procedural language. The operation you define in the query is performed for all the rows in a table.
with cte as (SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
(a.follower_count - (b.organic_gain+b.paid_gain)) AS follower_count,
a.account_name,
a.last_update,
b.organic_gain,
b.paid_gain
FROM follower_counts a
JOIN follower_gains b ON a.account_id = b.account_id
AND b.date < (select min(date) from
follower_counts c where a.account.id = c.account_id)
)
SELECT b.account_id,
b.date,
b.organizational_entity,
b.organizational_entity_type,
b.vanity_name,
b.localized_name,
b.localized_website,
b.organization_type,
b.follower_count,
b.account_name,
b.last_update,
b.organic_gain,
b.paid_gain
FROM cte b
UNION ALL
SELECT a.account_id,
a.date,
a.organizational_entity,
a.organizational_entity_type,
a.vanity_name,
a.localized_name,
a.localized_website,
a.organization_type,
a.follower_count,
a.account_name,
a.last_update,
NULL as organic_gain,
NULL as paid_gain
FROM follower_counts a where not exists (select 1 from
follower_gains c where a.account_id = c.account_id AND a.date = c.date)
You could do something like this, instead of using the variable you can just wrap it another bracket and write at end ) AS FollowerGrowth
DECLARE #FollowerGrowth INT =
( SELECT total_followers_count
FROM follower_gains
WHERE AccountID = xx )
-
( SELECT TOP 1 follower_count
FROM follower_counts
WHERE AccountID = xx
ORDER BY date ASCENDING )
I generated a query to pull in unique logins by date, and the average duration of each. What I am now trying to do is add a column representing the logins.
The way I'm pulling in the unique logins is by subtracting the maximum and minimum created date (as this is needed to calculate average duration).
I now have a simple query to calculate all logins (not simply unique logins).
Based on my query - how can I add a column so I will have logins, unique logins, average_duration, and the login_date?
MY QUERY:
SELECT
COUNT(unique_session_ids) as unique_logins
,AVG(
CASE WHEN duration > '0'
THEN duration
END) as average_duration
,login_date
FROM(
SELECT a.session_id as unique_session_ids
,MAX(a.created)-min(a.created) as duration
,MIN(to_char(b.created,'mm-dd')) as login_date
FROM base_identity a
INNER JOIN base_identity b
ON a.session_id = b.session_id
WHERE a.source_system_id = 11
AND a.created >= '2018-12-01'
GROUP BY a.session_id) x
GROUP BY login_date;
WHAT I WANT TO ADD TO THAT:
SELECT COUNT(session_id) as logins
FROM base_identity
GROUP BY to_char(created,'mm-dd')
So, I essentially just want the logins and unique logins represented together.
Thanks!
You can select from multiple tables with the following syntax
SELECT Table1.column_name, Table2.column_name
FROM Table1, Table2
In your case, it would look like
SELECT
COUNT(x.unique_session_ids) as unique_logins
,AVG(CASE WHEN x.duration > '0' THEN x.duration END) as average_duration
,x.login_date
,COUNT(y.session_id)
FROM([...]) x
, base_identity y
GROUP BY x.login_date;
Unable to write a Sql for my problem.
I have a table with 2 columns item code and expiration date.
Itemcode. Expiration
Abc123. 2014-08-08
Abc234. 2014-07-07
Cfg345. 2014-06-06
Cfg567. 2014-07-08
The output should be based on first 3 digits of item code and minimum expirarion date like below
Abc. 2014-07-07. Abc234
Cfg. 2014-06-06. Cfg345
Thanks
EDITED:
The query goes like this which actually is joining multiple tables to fetch the itemcode and expiration.
select substr(y.itemcode,1,3),
min(x.expiration_date) expiry,
y.itemcode
from X x, Y y
where y.id = x.id
and x.number in
(select number from xyz
where id = x.id
and codec in ('C', 'M', 'T', 'H')
)
group by substr(y.itemcode,1,3), y.itemcode
I am not familiar with "m". Here is an ANSI standard SQL solution:
select substring(itemcode, 1, 3), expiration, itemcode
from (select t.*,
row_number() over (partition by substring(itemcode, 1, 3)
order by expiration desc
) as seqnum
from table t
) t
where seqnum = 1;
Most databases support this functionality. Some might have slightly different names (such as substr() or left() for the substring operation).
I know that there are some threads on this subject, however, my query is slightly different to what I've seen and the solutions presented before don't seem to be working for me.
I have two tables, X and Y, here simplified to one ID, in fact of course I have multiple IDs. The period category lasts from the Date given to the beginning of the next period.
ID Date Period
A 12/01/2010 1
A 12/03/2010 2
A 15/06/2010 3
A 17/08/2010 4
A 20/10/2010 5
and
ID SampleDate
A 20/01/2010
A 25/01/2010
A 21/11/2010
What I need to get is:
ID SampleDate Period
A 20/01/2010 1
A 25/01/2010 1
A 21/11/2010 5
I've tried this:
with cte as
(
select
Y.ID,
Y.sampleDate,
X.Period,
ROW_NUMBER() over (PARTITION by Y.ID, Y.sampleDate order by DATEDIFF(day,X.Date, Y.sampleDate)) as DaysSince
from X
left join Y
on X.ID=Y.ID
)
select ID,
sampleDate,
Period
from cte
where DaysSince=1
This produces the correct size of the table, but instead of giving the perspective periods for the samples, it just prints out the top period number for all of them (for a given ID).
Any idea where I'm making a mistake?
There is nothing in your query that removes entries with negative datediff, so if you add that to the join:
with cte as
(
select
Y.ID,
Y.sampleDate,
X.Period,
ROW_NUMBER() over (PARTITION by Y.ID, Y.sampleDate order by DATEDIFF(day,X.Date, Y.sampleDate)) as DaysSince
from X
left join Y
on X.ID=Y.ID and X.Date < Y.sampleDate /* skip periods after the one we're interested in */
)
select ID,
sampleDate,
Period
from cte
where DaysSince=1
I have a data set where the structure could be like this
yes_no date
0 1/1/2011
1 1/1/2011
1 1/2/2011
0 1/4/2011
1 1/9/2011
Given a start data and and end date, I would like to create a query where it would aggregate over the date and provide a 0 for dates that do not exist in the table, for dates between start_data and end_date including both
This is in SQL.
I am stumped. I can get the aggregate queries very simply, but i don't know how to get zeros for dates that do not exist in the table.
If you're working with a DBMS that supports common table expressions, the following will generate a derived table of dates that you can then left join to your table. This was written for MSSQL, so you may need to derive your dates differently (i.e., an object other than master..spt_values)
with AllDates as (
select top 100000
convert(datetime, row_number() over (order by x.name)) as 'Date'
from
master..spt_values x
cross join master..spt_values y
)
select
ad.Date, isnull(yt.yn, 0)
from
AllDates ad
left join (
select date, sum(yes_no) yn
from YourTable yt
) yt
on ad.date = yt.date
where
ad.Date between YourStartDate and YourEndDate
Generating the dates has to be the way to go.
In ORACLE you could join on to a list of dates, why not..
(SELECT TRUNC(startdate + LEVEL)
FROM DUAL CONNECT BY LEVEL <(enddate-startdate))
If you can't generate your dates on-the-fly
a database agnostic solution would be to create a table containing all of the dates you will ever need and join on to that. (this should be your last resort)
here's the pseudeo code, you will need to substitute mydates for either the on-the fly sql or date table select
SELECT
CASE WHEN COUNT(b.date)=0
THEN
0
ELSE
1
END as yes_no
FROM (mydates) a
LEFT JOIN aggtable b ON a.date=b.date