How to add date rows for messages query? - sql

I got a Messages table.
id | sender_id | message | date
1 | 1 | Cya | 10/10/2020
2 | 2 | Bye | 10/10/2020
3 | 1 | Heya | 10/11/2020
I want to insert date rows and a type column based on the date, so it looks like this.
id | sender_id | message | date | type
1 | null | null | 10/10/2020 | date
1 | 1 | Cya | 10/10/2020 | message
2 | 2 | Bye | 10/10/2020 | message
2 | null | null | 10/11/2020 | date
3 | 1 | Heya | 10/11/2020 | message
3 | null | null | 10/11/2020 | date
When ordering by date, type, the first and the last rows are dates. And there is a date row between every two messages with different dates having the later date's value.
I got no idea how to tackle this one. Please tell me if you got any ideas on how to approach this.

This is quite complicated, because you want the new rows to contain the next date but the previous max id (if it exists) and also 1 row at the end.
So you can use UNION ALL for 3 separate cases:
select id, sender_id, message, date, type
from (
select id, sender_id, message, date, 'message' as type, 2 sort
from Messages
union all
select lag(max(id), 1, min(id)) over (order by date), null, null, date, 'date', 1
from Messages
group by date
union all
select * from (
select id, null, null, date, 'date', 3
from Messages
order by date desc, id desc limit 1
)
)
order by date, sort, id
Note that this will work only if your dates are in the format YYYY-MM-DD which is comparable and the only valid date format for SQLite.
See the demo.
Results:
> id | sender_id | message | date | type
> -: | :-------- | :------ | :--------- | :------
> 1 | null | null | 2020-10-10 | date
> 1 | 1 | Cya | 2020-10-10 | message
> 2 | 2 | Bye | 2020-10-10 | message
> 2 | null | null | 2020-10-11 | date
> 3 | 1 | Heya | 2020-10-11 | message
> 3 | null | null | 2020-10-11 | date

Hmmm . . . I think you want union all:
select id, sender_id, message, date, 'message' as type
from t
union all
select id, null, null, date, 'date'
from t
order by id;
EDIT:
Based on your comment:
select id, sender_id, message, date, 'message' as type
from t
union all
select min(id), null, null, date, 'date'
from t
group by date

Related

How to check record by record if one date is between two other dates from a second table with multiple date ranges?

I am facing the following problem. I searched for hours for a similar question, but can't find an answer.
Question:
How to check if there is a range that contains a given date using SQL?
This is more of a general question as stated in the subject, but below you can find a little context.
I want to:
calculate if there was an active subscription for a specific user at a given date.
Below I attach the sample tables. I want to use this later for calculations of retention/churn/reactivations etc.
the tables are in BigQuery, so it is standard SQL question.
Given:
Table 1: User_id and a date I want to check if there was an active subscription at this date
Table 2: Subscription transactions with date of transaction and expiry date
Desired output:
Table 3: Table 1 with "check" column if there is any record in the second table that it's range contains this Table1.Date
Table 1: User_date
|---------------------|------------------|
| User_id | Date |
|---------------------|------------------|
| 1 | 2020-10-31 |
|---------------------|------------------|
| 1 | 2020-11-30 |
|---------------------|------------------|
| 2 | 2020-10-31 |
|---------------------|------------------|
| 2 | 2020-11-30 |
|---------------------|------------------|
| 3 | 2020-10-31 |
|---------------------|------------------|
Table 2: Subscription_transactions
|---------------------|------------------|------------------|
| Transaction_date |Transaction_expiry| User_id |
|---------------------|------------------|------------------|
| 2020-10-01 | 2020-10-28 | 1 |
|---------------------|------------------|------------------|
| 2020-10-29 | 2020-11-15 | 1 |
|---------------------|------------------|------------------|
| 2020-10-15 | 2020-11-15 | 2 |
|---------------------|------------------|------------------|
| 2020-09-29 | 2020-10-15 | 3 |
|---------------------|------------------|------------------|
Table 3: Desired Output
|---------------------|------------------|------------------|
| User_id | Date | Is_active |
|---------------------|------------------|------------------|
| 1 | 2020-10-31 | TRUE |
|---------------------|------------------|------------------|
| 1 | 2020-11-30 | FALSE |
|---------------------|------------------|------------------|
| 2 | 2020-10-31 | TRUE |
|---------------------|------------------|------------------|
| 2 | 2020-11-30 | FALSE |
|---------------------|------------------|------------------|
| 3 | 2020-10-31 | FALSE |
|---------------------|------------------|------------------|
Does this do what you want?
select ud.*,
exists (select 1
from Subscription_transactions st
where u.user_id = st.user_id and
u.date between st.Transaction_date and st.Transaction_expiry
) as is_active
from user_date ud;
Below is for BigQuery Standard SQL:
with user_date as (
select 1 user_id, '2020-10-31' date union all
select 1 user_id, '2020-11-30' date union all
select 2 user_id, '2020-10-31' date union all
select 2 user_id, '2020-11-30' date union all
select 3 user_id, '2020-10-31' date
),
Subscription_transactions as (
select '2020-10-01' Transaction_date, '2020-10-28' Transaction_expiry, 1 User_id union all
select '2020-10-29', '2020-11-15', 1 union all
select '2020-10-15', '2020-11-15', 2 union all
select '2020-09-29', '2020-10-15', 3
)
SELECT ud.*
, CASE WHEN st.user_id is NULL then FALSE else TRUE end
from user_date ud
left join Subscription_transactions st
on ud.user_id = st.user_id
and ud.date between st.Transaction_date and st.Transaction_expiry

How to select the latest date for each group by number?

I've been stuck on this question for a while, and I was wondering if the community would be able to direct me in the right direction?
I have some tag IDs that needs to be grouped, with exceptions (column: deleted) that need to be retained in the results. After which, for each grouped tag ID, I need to select the one with the latest date. How can I do this? An example below:
ID | TAG_ID | DATE | DELETED
1 | 300 | 05/01/20 | null
2 | 300 | 03/01/20 | 04/01/20
3 | 400 | 06/01/20 | null
4 | 400 | 05/01/20 | null
5 | 400 | 04/01/20 | null
6 | 500 | 03/01/20 | null
7 | 500 | 02/01/20 | null
I am trying to reach this outcome:
ID | TAG_ID | DATE | DELETED
1 | 300 | 05/01/20 | null
2 | 300 | 03/01/20 | 04/01/20
3 | 400 | 06/01/20 | null
6 | 500 | 03/01/20 | null
So, firstly if there is a date in the "DELETED" column, I would like the row to be present. Secondly, for each unique tag ID, I would like the row with the latest "DATE" to be present.
Hopefully this question is clear. Would appreciate your feedback and help! A big thanks in advance.
Your results seem to be something like this:
select t.*
from (select t.*,
row_number() over (partition by tag_id, deleted order by date desc) as seqnum
from t
) t
where seqnum = 1 or deleted is not null;
This takes one row where deleted is null -- the most recent row. It also keeps each row where deleted is not null.
You need 2 conditions combined with OR in the WHERE clause:
the 1st is deleted is not null, or
the 2nd that there isn't any other row with the same tag_id and date later than the current row's date, meaning that the current row's date is the latest:
select t.* from tablename t
where t.deleted is not null
or not exists (
select 1 from tablename
where tag_id = t.tag_id and date > t.date
)
See the demo.
Results:
| id | tag_id | date | deleted |
| --- | ------ | ---------- | -------- |
| 1 | 300 | 2020-05-01 | |
| 2 | 300 | 2020-03-01 | 04/01/20 |
| 3 | 400 | 2020-06-01 | |
| 6 | 500 | 2020-03-01 | |

PARTITION BY in CASE doesn't work with several AND statements

I have a table with 4 columns: hitId, userId, timestamp and Camp.
I need to classify if a hit is a start of a new session or not (1 or 0) using two parameters: 1. the time difference between hits and 2. if the source of the hit is a new campaign.
I need a standard SQL query in BigQuery.
A hit is considered as a start of a new session if one of the following is true:
it's the first hit from its userId
the time difference between the timestamp of the previous hit from
the same userId is more than 30 mins.
the time difference between the timestamp of the previous hit from the same userId is less than 30 mins, but Camp (ad campaign) value is not NULL and occures for the first time for the same userId within the previous 30 min.
So if hit1 from user1 has a Camp equal to Campaign1, and hit2 from user1 has a Camp equal to Campaign1, and time difference between hit1 and hit2 is less than 30 mins, hit1 will be considered as a start of a session, and hit2 won't be considered as a start.
I have a trouble with Campaign part. I tried this code:
I tried this code:
WITH timeDifference AS (
SELECT *,
TIMESTAMP_DIFF(timestamp, LAG(timestamp, 1) OVER
(PARTITION BY userId ORDER BY timestamp), SECOND) AS difference
FROM hitTable
ORDER BY timestamp)
SELECT *,
CASE
WHEN difference >= 30 * 60 THEN 1
WHEN difference IS NULL THEN 1
WHEN difference <= 30 * 60 AND Camp IS NOT NULL AND RANK()
OVER (PARTITION BY userId ORDER BY Camp) = 1 THEN 1
ELSE 0 END AS sess
FROM timeDifference
ORDER BY timestamp;
The condition RANK() OVER (PARTITION BY userId ORDER BY Camp) seems not working, as I receive this table:
hitId | userId | timestamp | Camp | difference | sess
_______________________________________________________________________
00150 | 858201 | 00:48:35.315 | NULL | NULL | 1
00151 | 858201 | 00:49:35.315 | NULL | 5 | 0
00152 | 858201 | 00:50:35.315 | Search-Ads-US | 10 | 0
00153 | 858201 | 00:53:35.315 | Search-Ads-US | 15 | 0
00154 | 858202 | 00:54:35.315 | Facebook-Ads | NULL | 1
00155 | 858202 | 00:54:55.315 | Facebook-Ads | 9 | 0
00156 | 858202 | 00:57:20.315 | Facebook-Ads | 12 | 0
While I expect to have 1 for sess column for hitId = 00152:
hitId | userId | timestamp | Camp | difference | sess
_______________________________________________________________________
00150 | 858201 | 00:48:35.315 | NULL | NULL | 1
00151 | 858201 | 00:49:35.315 | NULL | 5 | 0
00152 | 858201 | 00:50:35.315 | Search-Ads-US | 10 | 1
00153 | 858201 | 00:53:35.315 | Search-Ads-US | 15 | 0
00154 | 858202 | 00:54:35.315 | Facebook-Ads | NULL | 1
00155 | 858202 | 00:54:55.315 | Facebook-Ads | 9 | 0
00156 | 858202 | 00:57:20.315 | Facebook-Ads | 12 | 0
This RANK() OVER (PARTITION BY userId ORDER BY Camp) returns falsely results in cases where a user had multiple Camps.
Notice your PARTITION BY uses userId while you want to mark sessions within each Camp.
The actual "rank 1" of the RANK() (...) statement for userId 00150 is where the Camp is NULL (hitId 00150) therefore it misses your CASE condition at hitId 00152.
You could try and add 'Camp' to your PARTITION BY as follows:
RANK() OVER (PARTITION BY userId, Camp ORDER BY Camp)
Alternatively, you could replace the RANK() (...) and use LAG(Camp) (... order by timestamp) in addition to the LAG(timestamp) (...) you are calculating.
This will retrieve the Camp value for the row before (call it 'PreviousCampValue'). Then you could add something like WHEN PreviousCampValue != Camp THEN 1
Hope that's helpful

Union in outer query

I'm attempting to combine multiple rows using a UNION but I need to pull in additional data as well. My thought was to use a UNION in the outer query but I can't seem to make it work. Or am I going about this all wrong?
The data I have is like this:
+------+------+-------+---------+---------+
| ID | Time | Total | Weekday | Weekend |
+------+------+-------+---------+---------+
| 1001 | AM | 5 | 5 | 0 |
| 1001 | AM | 2 | 0 | 2 |
| 1001 | AM | 4 | 1 | 3 |
| 1001 | AM | 5 | 3 | 2 |
| 1001 | PM | 5 | 3 | 2 |
| 1001 | PM | 5 | 5 | 0 |
| 1002 | PM | 4 | 2 | 2 |
| 1002 | PM | 3 | 3 | 0 |
| 1002 | PM | 1 | 0 | 1 |
+------+------+-------+---------+---------+
What I want to see is like this:
+------+---------+------+-------+
| ID | DayType | Time | Tasks |
+------+---------+------+-------+
| 1001 | Weekday | AM | 9 |
| 1001 | Weekend | AM | 7 |
| 1001 | Weekday | PM | 8 |
| 1001 | Weekend | PM | 2 |
| 1002 | Weekday | PM | 5 |
| 1002 | Weekend | PM | 3 |
+------+---------+------+-------+
The closest I've come so far is using UNION statement like the following:
SELECT * FROM
(
SELECT Weekday, 'Weekday' as 'DayType' FROM t1
UNION
SELECT Weekend, 'Weekend' as 'DayType' FROM t1
) AS X
Which results in something like the following:
+---------+---------+
| Weekday | DayType |
+---------+---------+
| 2 | Weekend |
| 0 | Weekday |
| 2 | Weekday |
| 0 | Weekend |
| 10 | Weekday |
+---------+---------+
I don't see any rhyme or reason as to what the numbers are under the 'Weekday' column, I suspect they're being grouped somehow. And of course there are several other columns missing, but since I can't put a large scope in the outer query with this as inner one, I can't figure out how to pull those in. Help is greatly appreciated.
It looks like you want to union all a pair of aggregation queries that use sum() and group by id, time, one for Weekday and one for Weekend:
select Id, DayType = 'Weekend', [time], Tasks=sum(Weekend)
from t
group by id, [time]
union all
select Id, DayType = 'Weekday', [time], Tasks=sum(Weekday)
from t
group by id, [time]
Try with this
select ID, 'Weekday' as DayType, Time, sum(Weekday)
from t1
group by ID, Time
union all
select ID, 'Weekend', Time, sum(Weekend)
from t1
group by ID, Time
order by order by 1, 3, 2
Not tested, but it should do the trick. It may require 2 proc sql steps for the calculation, one for summing and one for the case when statements. If you have extra lines, just use a max statement and group by ID, Time, type_day.
Proc sql; create table want as select ID, Time,
sum(weekday) as weekdayTask,
sum(weekend) as weekendTask,
case when calculated weekdaytask>0 then weekdaytask
when calculated weekendtask>0 then weekendtask else .
end as Task,
case when calculated weekdaytask>0 then "Weekday"
when calculated weekendtask>0 then "Weekend"
end as Day_Type
from have
group by ID, Time
;quit;
Proc sql; create table want2 as select ID, Time, Day_Type, Task
from want
;quit;

How can I do SQL query count based on certain criteria including row order

I've come across certain logic that I need for my SQL query. Given that I have a table as such:
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 1 | null | 2016-05-10 |
| 1 | null | 2016-05-09 |
| 1 | yes | 2016-05-08 |
+----------+-------+------------+
This table is produced by a simple query:
SELECT * FROM products WHERE product = 1 ORDER BY date desc
Now what I need to do is create a query to count the number of nulls for certain products by order of date until there is a yes value. So the above example the count would be 2 as there are 2 nulls until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 2 | null | 2016-05-10 |
| 2 | yes | 2016-05-09 |
| 2 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 1 as there is 1 null until a yes.
+----------+-------+------------+
| product | valid | Date |
+----------+-------+------------+
| 3 | yes | 2016-05-10 |
| 3 | yes | 2016-05-09 |
| 3 | null | 2016-05-08 |
+----------+-------+------------+
Above would return 0.
You need a Correlated Subquery like this:
SELECT COUNT(*)
FROM products AS p1
WHERE product = 1
AND Date >
( -- maximum date with 'yes'
SELECT MAX(Date)
FROM products AS p2
WHERE p1.product = p2.product
AND Valid = 'yes'
)
This should do it:
select count(1) from table where valid is null and date > (select min(date) from table where valid = 'yes')
Not sure if your logic provided covers all the possible weird and wonderful extreme scenarios but the following piece of code would do what you are after:
select a.product,
count(IIF(a.valid is null and a.date >maxdate,a.date,null)) as total
from sometable a
inner join (
select product, max(date) as Maxdate
from sometable where valid='yes' group by product
) b
on a.product=b.product group by a.product