how to fetch count data of 2 date fields in same month in SQL - sql

I am trying to create a query where I have 3 column.
C_Time: contains task Creation date time
Done_Time: Contains Task completion date time
User ID: Unique id of user
I want to get result where I want to get total count of created tasks in particular month and total number of done task at that same month grouped by user id
Output will be like:
UserID | CreatedCount | DoneCount
------------------------------------------
U12 | 12 | 12
-------------------------------------------
U13 | 7 | 5
here U12 user have created 12 tasks and completed 12 tasks in January 2020 month. But user U13 created 7 tasks in Jan 2020 and done 5 tasks in same month.

You can use apply to unpivot the data and then aggregation:
select t.user_id, sum(is_create), sum(is_complete)
from t cross apply
(values (t.c_time, 1, 0), (t.done_time, 0, 1)
) v(t, is_create, is_complete)
where v.t >= '2020-01-01' and v.t < '2020-02-01'
group by t.user_id;
You can also do this with conditional aggregation:
select user_id,
sum(case when c_time >= '2020-01-01' and c_time < '2020-02-01' then 1 else 0 end),
sum(case when done_time >= '2020-01-01' and done_time < '2020-02-01' then 1 else 0 end)
from t
group by user_id;
This is probably a little faster for your particular example. However, the first version is more generalizable -- for instance, it allows you to summarize easily by both user and month.

Related

T-SQL filtering records based on dates and time difference with other records

I have a table for which I have to perform a rather complex filter: first a filter by date is applied, but then records from the previous and next days should be included if their time difference does not exceed 8 hours compared to its prev or next record (depending if the date is less or greater than filter date).
For those adjacent days the selection should stop at the first record that does not satisfy this condition.
This is how my raw data looks like:
Id
Desc
EntryDate
1
Event type 1
2021-03-12 21:55:00.000
2
Event type 1
2021-03-12 01:10:00.000
3
Event type 1
2021-03-11 20:17:00.000
4
Event type 1
2021-03-11 05:04:00.000
5
Event type 1
2021-03-10 23:58:00.000
6
Event type 1
2021-03-10 11:01:00.000
7
Event type 1
2021-03-10 10:00:00.000
In this example set, if my filter date is '2021-03-11', my expected result set should be all records from that day plus adjacent records from 03-12 and 03-10 that satisfy the 8 hours condition. Note how record with Id 7 is not be included because record with Id 6 does not comply:
Id
EntryDate
2
2021-03-12 01:10:00.000
3
2021-03-11 20:17:00.000
4
2021-03-11 05:04:00.000
5
2021-03-10 23:58:00.000
Need advice how to write this complex query
This is a variant of gaps-and-islands. Define the difference . . . and then groups based on the differences:
with e as (
select t.*
from (select t.*,
sum(case when prev_entrydate > dateadd(hour, -8, entrydate) then 0 else 1 end) over (order by entrydate) as grp
from (select t.*,
lag(entrydate) over (order by entrydate) as prev_entrydate
from t
) t
)
select e.*
from e.*
where e.grp in (select e2.grp
from t e2
where date(e2.entrydate) = #filterdate
);
Note: I'm not sure exactly how filter date is applied. This assumes that it is any events on the entire day, which means that there might be multiple groups. If there is only one group (say the first group on the day), the query can be simplified a bit from a performance perspective.
declare #DateTime datetime = '2021-03-11'
select *
from t
where t.EntryDate between DATEADD(hour , -8 , #DateTime) and DATEADD(hour , 32 , #DateTime)

How to obtain information from 10 dates without using 10+ left joins

I have some information as shown in the simplified table below.
login_date | userid
-------------------------
2020-12-01 | 123
2020-12-01 | 456
2020-12-02 | 123
2020-12-02 | 456
2020-12-02 | 789
2020-12-03 | 123
2020-12-03 | 789
The range of dates found in login_date span from 2020-12-01 to 2020-12-12 and the userid for each day is unique.
What I wish to obtain comes in 2 folds:
The number of users who first logged in on a certain date. excluding users who logged in on preceding day(s).
For users who first logged in on a certain date (e.g. 2020-12-01), how many of them logged in on subsequent days as well? (i.e. of the batch who first logged in on 2020-12-01, how many were found to log in on 2020-12-02, 2020-12-03.. and so on)
For the above table, an example of the desired result may be as follows:
| 2020-12-01 | 2020-12-02 | 2020-12-03 | ... (users' first login date)
----------------------------------------------------------------------------------------
| 2020-12-01 | 2 x x
users who continued | 2020-12-02 | 2 1 x
to log in on these | 2020-12-03 | 1 1 0
dates | ... |
Reasoning:
On the first day, two new users logged in, 123 and 456.
On the second day, the same old users, 123 and 456, logged in as well. In addition, a new user (logging in for the first time), 789, was added.
On the third day, only one of the original old users, 123 logged in. (count of 1). The new user (from the second day), 789, logged in as well. (count of 1)
My attempt
I actually managed to obtain a (rough) solution in two parts. For the first day, 2012-12-01, I simply filtered users who logged in on the first day and performed left joins for all the remaining dates:
select count(d1.userid) as d1_users, count(d2.userid) as d2_users, ... (repeated for all joined tables)
from table1 d1
left join (
select userid
from table1
where login_date = date('2020-12-02')
) d2
on d1.userid = d2.userid
... -- (10 more left joins, with each filtering by an incremented date value)
where d1.login_date = date('2020-12-01')
For dates following the second day onwards, I did a bit of preprocessing to exclude users who had logged in on preceding day(s):
with d2_users as (
select userid
from table1 a
left join (
select userid
from table1
where login_date = date('2020-12-01')
) b
on a.userid = b.userid
where b.userid is null -- filtering out users who logged in on preceding day(s)
and a.login_date = date('2020-12-02')
)
select count(d2.userid) as d2_users, ... -- (repeated for all joined tables)
from d2_users d2
left join (
select userid
from table1
where login_date = date('2020-12-03')
) d3
on d2.userid = d3.userid
... -- (similar to the query for the 2020-12-01)
In the process of writing and executing this query it took a lot of manual editing (deleting of unnecessary left joins for later dates and count), and ultimately the entire query for just two days takes up 300+ lines of SQL code. I am not sure whether there is a more efficient process for this.
Any advice would be greatly appreciated! I would be happy to provide further clarification if needed as well since the optimization of the solution to this problem has been bugging me for some time.
I apologize for the poor formatting of the desired result, as I currently only have a representation of it in a spreadsheet and not an idea of how it may look like as a SQL output.
Edit:
I realized I may not have communicated the ideal outcomes properly. For each min_login_date identified, what I wish to obtain is the number of users who continue to log in from a preceding date. An example would be:
10 users log in on 2020-12-01. Hence, the count for 2020-12-01 = 10.
Of the 10 previous users, 8 users log in on 2020-12-02. Hence the count for 2020-12-02 = 8.
Of the 8 users (from the previous day), 6 users log in on 2020-12-03. Hence the count for 2020-12-03 = 6.
As such for each min_login_date, the user count for subsequent dates should be <= that of the user count for previous dates. Hope this helps! I apologize for any miscommunication.
You can use window functions to get the earliest date. And then aggregate:
select min_login_date, count(*) as num_on_day,
sum(case when login_date = '2020-12-01' then 1 else 0 end) as login_20201201,
sum(case when login_date = '2020-12-02' then 1 else 0 end) as login_20201203,
. . .
from (select t.*,
min(login_date) over (partition by user_id) as min_login_date
from t
) t
group by min_login_date
I think you need some tweak using analytical function and aggregate function as follows:
select login_date,
Count(case when min_login_date = '2020-12-01' then 1 end) as login_20201201,
Count(case when min_login_date = '2020-12-02' then 1 end) as login_20201202,
......
from (select t.*,
min(login_date) over (partition by user_id) as min_login_date,
Lag(login_date) over (partition by user_id) as lag_login_date,
from your_taeble t
Where t.login_date between '2020-12-01' and '2020-12-12'
) t
where (lag_login_date = login_date - interval '1 day' or lag_login_date is null)
group by login_date

SQL query: get total values for each month

I have a table that stores, number of fruits sold on each day. Stores number of items sold on particular date.
CREATE TABLE data
(
code VARCHAR2(50) NOT NULL,
amount NUMBER(5) NOT NULL,
DATE VARCHAR2(50) NOT NULL,
);
Sample data
code |amount| date
------+------+------------
aple | 1 | 01/01/2010
aple | 2 | 02/02/2010
orange| 3 | 03/03/2010
orange| 4 | 04/04/2010
I need to write a query, to list out, how many apple and orange sold for jan and february?
--total apple for jan
select sum(amount) from mg.drum d where date >='01/01/2010' and cdate < '01/02/2020' and code = 'aple';
--total apple for feb
select sum(amount) from mg.drum d where date >='01/02/2010' and cdate < '01/03/2020' and code = 'aple';
--total orange for jan
select sum(amount) from mg.drum d where date >='01/01/2010' and cdate < '01/02/2020' and code = 'orange';
--total orange for feb
select sum(amount) from mg.drum d where date >='01/02/2010' and cdate < '01/03/2020' and code = 'orange';
If I need to calculate for more months, more fruits, its tedious.is there a short query to write?
Can I combine at least for the months into 1 query? So 1 query to get total for each month for 1 fruit?
You can use conditional aggregation such as
SELECT TO_CHAR("date",'MM/YYYY') AS "Month/Year",
SUM( CASE WHEN code = 'apple' THEN amount END ) AS apple_sold,
SUM( CASE WHEN code = 'orange' THEN amount END ) AS orange_sold
FROM data
WHERE "date" BETWEEN date'2020-01-01' AND date'2020-02-29'
GROUP BY TO_CHAR("date",'MM/YYYY')
where date is a reserved keyword, cannot be a column name unless quoted.
Demo
select sum(amount), //date.month
from mg.drum
group by //date.month
//data.month Here you can give experssion which will return month number or name.
If you are dealing with months, then you should include the year as well. I would recommend:
SELECT TRUNC(date, 'MON') as yyyymm, code,
SUM(amount)
FROM t
GROUP BY TRUNC(date, 'MON'), code;
You can add a WHERE clause if you want only some dates or codes.
This will return a separate row for each row that has data. That is pretty close to the results from your four queries -- but this does not return 0 values.
select to_char(date_col,'MONTH') as month, code, sum(amount)
from mg.drum
group by to_char(date_col,'MONTH'), code

Those who listened to more than 10 mins each month in the last 6 months

I'm trying to figure out the count of users who listened to more than 10 mins each month in the last 6 months
We have this event: Song_stopped_listen and one attribute is session_progress_ms
Now I'm trying to see the monthly evolution of the count of this cohort over the last 6 months.
I'm using bigquery and this is the query I tried, but I feel that something is off semantically, but I couldn't put my finger on:
SELECT
CONCAT(CAST(EXTRACT(YEAR FROM DATE (timestamp)) AS STRING),"-",CAST(EXTRACT(MONTH FROM DATE (timestamp)) AS STRING)) AS date
,SUM(absl.session_progress_ms/(1000*60*10)) as total_10_ms, COUNT(DISTINCT u.id) as total_10_listeners
FROM ios.song_stopped_listen as absl
LEFT JOIN ios.users u on absl.user_id = u.id
WHERE absl.timestamp > '2018-05-01'
Group by 1
HAVING(total_10_ms > 1)
Please help figure out what I'm doing wrong here.
Thank you.
data Sample:
user_id | session_progress_ms | timestamp
1 | 10000 | 2017-10-10 14:34:25.656 UTC
What I want to have:
||Month-year | Count of users who listened to more than 10 mins
|2018-5 | 500
|2018-6 | 600
|2018-7 | 300
|2018-8 | 5100
|2018-9 | 4500
|2018-10 | 1500
|2018-11 | 1500
|2018-12 | 2500
Use multiple levels of aggregation:
select user_id
from (select ssl.user_id, timestamp_trunc(timestamp, month) as mon,
sum(ssl.session_progress_ms/(1000*60)) as total_minutes
from ios.song_stopped_listen as ssl
where date(ssl.timetamp) < date_trunc(current_date, month) and
date(ssl.timestamp) >= date_add(date_trunc(current_date, month) interval 6 month),
group by 1, 2
) u
where total_minutes >= 10
group by user_id
having count(*) = 6;
To get the count, just use this as a subquery with count(*).

Get historical count and current count on parking data

I've previously got very good help here on SO in regards to analyze parking data. This is my query:
select parking_meter_id, avg(cnt) from
(select parking_meter_id, count(*) as cnt, to_char(start,'YYYYMMDD') as day
from parking_transactions
where start >= now() - interval '3 month' -- last three months
and to_char(start,'YYYYMMDD') < to_char(now(),'YYYYMMDD') -- but not today
and to_char(start,'D') = to_char(now(),'D') -- same weekday
and to_char(now(),'HH24MISS') between to_char(start,'HH24MISS') and to_char(stop,'HH24MISS') -- same time
group by parking_meter_id, to_char(start,'YYYYMMDD') -- group by day
) as parking_transactions group by parking_meter_id
It does work and show average count on active transactions this is due to the fact that transactions from today (now()) are filtered away.
Is it possible, in same run through, to have the query also return the current active transactions:
select count(*) as cnt from parking_transactions where now() between start and stop
so one can easily compare the current status with the historical?
My table structure are:
parking_meter_id, start, stop
Currently I get the following output:
parking_meter_id, avg(cnt) minus today
I would like to have the following output:
parking_meter_id, avg(cnt) minus today, count(*) for today only
The -- but not today are the where clause which ignores todays transactions.
An example of output as of now is the following:
parking_meter_id | cnt | day
------------------+-----+----------
4406 | 1 | 20141217
4406 | 5 | 20150107
4406 | 1 | 20150121
4406 | 3 | 20150128
4406 | 3 | 20150114
I would like to have returned:
parking_meter_id | avg(cnt-without-today) | cnt-day
------------------+-----+------------------------------
4406 | 2.6 | 3
Use WITH to create temporary tables for daily count and avg count minus today and join the tables to get desired result
SQL Fiddle
SQL
WITH daily_count AS -- temp table to store daily counts
(
SELECT parking_meter_id,
COUNT(*) AS cnt,
to_char(start,'YYYYMMDD') AS day
FROM parking_transactions
WHERE start >= now() - interval '3 month' -- last three months
AND to_char(start,'D') = to_char(now(),'D') -- same weekday
AND to_char(now(),'HH24MISS') BETWEEN to_char(start,'HH24MISS') AND to_char(stop,'HH24MISS') -- same time
GROUP BY parking_meter_id,
to_char(start,'YYYYMMDD') -- group by parking meter id and day
), avg_count_minus_today AS -- temp table to store avg count minus today
(
SELECT parking_meter_id,
AVG(cnt) AS avg_count
FROM daily_count
WHERE day < to_char(now(),'YYYYMMDD') -- but not today
GROUP BY parking_meter_id
)
SELECT a.parking_meter_id,
a.avg_count, --avg count minus today
d.cnt AS today_count
FROM avg_count_minus_today a
INNER JOIN daily_count d
ON a.parking_meter_id= d.parking_meter_id AND d.day=to_char(now(),'YYYYMMDD'); --today in daily count