sql db2 group by first day of each year - sql

I have the following:
id date_start date_end
----- ---------- --------
1a3 2001-12-12 2002-12-12
23b 2005-01-24 2008-11-02
11ad 2012-01-15 2014-13-09
19d 2015-01-23 2016-02-04
And I want to get the count of each person where the date range includes the first day of each year.
for example I can do:
select count(distinct id) from table
where '2001-01-01' between date_start and date_end
but I want to produce the count for all years from 2000-2015. I want to avoid manually doing:
select count(distinct id) from table
where '2001-01-01' between date_start and date_end
select count(distinct id) from table
where '2002-01-01' between date_start and date_end
select count(distinct id) from table
where '2003-01-01' between date_start and date_end
I am just having trouble visualizing the group by clause for this.If I had just year I could do:
select count(distinct id), year from table
group by year
however I cannot fit the where '2001-01-01' between date_start and date_end into this group clause.
can anyone help?
Thanks!

You can use left join. Here is a method:
select y.yyyy, count(distinct t.id)
from ((select '2001-01-01' as yyyy from sys.sysdummy) union all
(select '2002-01-01' as yyyy from sys.sysdummy) union all
(select '2003-01-01' as yyyy from sys.sysdummy)
) y left join
table t
on y.yyyy between date_start and date_end
group by y.yyyy
order by y.yyyy;

Well you only have to check and count if the the year of date_start is different from the one of date_end.
select count(*)
from table
where year(date_start) < year(date_end)

I hope you don't mind me answering my own question. Using Michael's logic I was able to combine this with a dummy table - an idea from Gordon's answer. here is what I did:
with dummy(yr) as (
select 2000 from SYSIBM.SYSDUMMY1
union all
select yr + 1 from dummy where yr < 2015
)
select d.yr, count(distinct t.id)
from table t, dummy d
where d.yr between year(t.date_start) + 1 and year(t.date_end)
group by d.yr
order by d.yr
as I say I hope the community does not mind me taking the parts of both posters answers to get the solution I was looking for. I did not want hardcoding and I think the answer I post satisfies this while using a simple piece of date arithmetic logic.

Related

How to find the number of occurences within a date range?

Let's say I have hospital visits in the table TestData
I would like to know which patients have had a second hospital visit within 7 days of their first hospital visit.
How would I code this in SQL?
I have patient_id as a TEXT
the date is date_visit is also TEXT and takes the format MM/DD/YYYY
patient_id
date_visit
A123B29133
07/12/2011
A123B29133
07/14/2011
A123B29133
07/20/2011
A123B29134
12/05/2016
In the above table patient A123B29133 fulfills the condition as they were seen on 07/14/2011 which is less that 7 days from 07/12/2011
You can use a subquery with exists:
with to_d(id, v_date) as (
select patient_id, substr(date_visit, 7, 4)||"-"||substr(date_visit, 1, 2)||"-"||substr(date_visit, 4, 2) from visits
)
select t2.id from (select t1.id, min(t1.v_date) d1 from to_d t1 group by t1.id) t2
where exists (select 1 from to_d t3 where t3.id = t2.id and t3.v_date != t2.d1 and t3.v_date <= date(t2.d1, '+7 days'))
id
A123B29133
Since your date column is not in YYYY-MM-DD which is the default value used by several sqlite date functions, the substr function was used to transform your date in this format. JulianDay was then used to convert your dates to an integer value which would ease the comparison of 7 days. The MIN window function was used to identify the first hospital visit date for that patient. The demo fiddle and samples show the query that was used to transform the data and the results before the final query which filters based on your requirements i.e. < 7 days. With this approach using window functions, you may also retrieve the visit_date and the number of days since the first visit date if desired.
You may read more about sqlite date functions here.
Query #1
SELECT
patient_id,
visit_date,
JulianDay(visit_date) -
MIN(JulianDay(visit_date)) OVER (PARTITION BY patient_id)
as num_of_days_since_first_visit
FROM
(
SELECT
*,
(
substr(date_visit,7) || '-' ||
substr(date_visit,0,3) || '-' ||
substr(date_visit,4,2)
) as visit_date
FROM
visits
) v;
patient_id
visit_date
num_of_days_since_first_visit
A123B29133
2011-07-12
0
A123B29133
2011-07-14
2
A123B29133
2011-07-20
8
A123B29134
2016-12-05
0
Query #2
The below is your desired query, which uses the previous query as a CTE and applies the filter for visits less than 7 days. num_of_days <> 0 is applied to remove entries where the first date is also the date of the record.
WITH num_of_days_since_first_visit AS (
SELECT
patient_id,
visit_date,
JulianDay(visit_date) - MIN(JulianDay(visit_date)) OVER (PARTITION BY patient_id) num_of_days
FROM
(
SELECT
*,
(
substr(date_visit,7) || '-' ||
substr(date_visit,0,3) || '-' ||
substr(date_visit,4,2)
) as visit_date
FROM
visits
) v
)
SELECT DISTINCT
patient_id
FROM
num_of_days_since_first_visit
WHERE
num_of_days <> 0 AND num_of_days < 7;
patient_id
A123B29133
View on DB Fiddle
Let me know if this works for you.
I would like to know which patients have had a second hospital visit within 7 days of their first hospital visit.
You can use lag(). The following gets all rows where this is true:
select t.*
from (select t.*,
lag(date_visit) over (partition by patient_id order by date_visit) as prev_date_visit
from t
) t
where prev_date_visit >= date(date_visit, '-7 day');
If you just want the patient_ids, you can use select distinct patient_id.

Count number of ids by Month SQL

I have a table like this, I hope to count the number of ids by month. I used the following code but it does not work.
id date_time
1390880502018723840,2021-05-08
1390881127930372100,2021-05-08
1390881498270736386,2021-05-08
SELECT twitter.tweets.id
WHERE Month(twitter.tweets.date_time)=01 AND Year(twitter.tweets.date_time)=2021 ;
you have to use count() function and to_char to get year month part of date in one column:
SELECT count(witter.tweets.id)
WHERE to_char(twitter.tweets.date_time,'YYYY-MM')= '2021-01';
you can generalize it for all the month/year by using group by :
SELECT to_char(twitter.tweets.date_time,'YYYY-MM') , count(witter.tweets.id)
group by to_char(twitter.tweets.date_time,'YYYY-MM');
To get counts for all months since Jan 2021:
SELECT date_trunc('month', date_time), count(*)
FROM twitter.tweets
WHERE date_time >= '2021-01-01'
GROUP BY 1;
If id can be NULL (which should be disallowed for an id column), use the slightly more expensive count(id) instead.
Count of distinct IDs:
SELECT date_trunc('month', date_time), count(DISTINCT id)
FROM twitter.tweets
WHERE date_time >= '2021-01-01'
GROUP BY 1;
For only Jan 2021:
SELECT count(DISTINCT id)
FROM twitter.tweets
WHERE date_time >= '2021-01-01'
WHERE date_time < '2021-02-01';

Oracle SQL - count by day - not a GROUP BY expression

I have data which looks like this:
TIME ID
29/11/20 13:45:33,810000000 1234
06/01/21 13:45:33,810000000 5678
06/01/21 14:05:33,727000000 5678
That means, I have a column TIME and ID. What I want to do is to count all the entries by day and all the distinct IDs per day.
As result I would like to get this:
DAY COUNT(*) distinctID
29/11/20 1 1
06/01/21 2 1
I did this:
select trunc(to_char(TIME, ‘HH’),'DD/MM/YY'),
COUNT(*), count(distinct ID) as distinctID from CDRHEADER
where TIME>= date '2021-03-01'
group by trunc(TIME,'DD/MM/YY')
order by trunc(TIME,'DD/MM/YY');
As error I get: not a GROUP BY Expression.
Furthermore, I am also not sure about the date operations I executed and if they are correct.
NOTE: I would like to use the date entries as date values and not compare strings or something like this.
How can I get what I expect?
Hmmm . . . I think you want:
select trunc(time) as the_date,
count(*), count(distinct ID) as distinctID
from CDRHEADER
where time >= date '2021-03-01'
group by trunc(time)
order by trunc(time);
I'm not sure why you are using to_char() or 'HH'. If you really want to output the time as 'DD/MM/YYYY', then:
select to_char(trunc(time), 'DD/MM/YYYY') as the_date,
count(*), count(distinct ID) as distinctID
from CDRHEADER
where time >= date '2021-03-01'
group by trunc(time)
order by trunc(time);

Finding id's available in previous weeks but not in current week

How to find if an id which was present in previous weeks but not available in current week on a rolling basis. For e.g
Week1 has id 1,2,3,4,5
Week2 has id 3,4,5,7,8
Week3 has id 1,3,5,10,11
So I found out that id 1 and 2 are missing in week 2 and id 2,4,7,8 are missing in week 3 from previous 2 weeks But how to do this on a rolling window for a large amount of data distributed over a period of 20+ years
Please find the sample dataset and expected output. I am expecting the output to be partitioned based on the week_end Date
Dataset
ID|WEEK_START|WEEK_END|APPEARING_DATE
7152|2015-12-27|2016-01-02|2015-12-27
8350|2015-12-27|2016-01-02|2015-12-27
7152|2015-12-27|2016-01-02|2015-12-29
4697|2015-12-27|2016-01-02|2015-12-30
7187|2015-12-27|2016-01-02|2015-01-01
8005|2015-12-27|2016-01-02|2015-12-27
8005|2015-12-27|2016-01-02|2015-12-29
6254|2016-01-03|2016-01-09|2016-01-03
7962|2016-01-03|2016-01-09|2016-01-04
3339|2016-01-03|2016-01-09|2016-01-06
7834|2016-01-03|2016-01-09|2016-01-03
7962|2016-01-03|2016-01-09|2016-01-05
7152|2016-01-03|2016-01-09|2016-01-07
8350|2016-01-03|2016-01-09|2016-01-09
2403|2016-01-10|2016-01-16|2016-01-10
0157|2016-01-10|2016-01-16|2016-01-11
2228|2016-01-10|2016-01-16|2016-01-14
4697|2016-01-10|2016-01-16|2016-01-14
Excepted Output
Partition1: WEEK_END=2016-01-02
ID|MAX(LAST_APPEARING_DATE)
7152|2015-12-29
8350|2015-12-27
4697|2015-12-30
7187|2015-01-01
8005|2015-12-29
Partition1: WEEK_END=2016-01-09
ID|MAX(LAST_APPEARING_DATE)
7152|2016-01-07
8350|2016-01-09
4697|2015-12-30
7187|2015-01-01
8005|2015-12-29
6254|2016-01-03
7962|2016-01-05
3339|2016-01-06
7834|2016-01-03
Partition3: WEEK_END=2016-01-10
ID|MAX(LAST_APPEARING_DATE)
7152|2016-01-07
8350|2016-01-09
4697|2016-01-14
7187|2015-01-01
8005|2015-12-29
6254|2016-01-03
7962|2016-01-05
3339|2016-01-06
7834|2016-01-03
2403|2016-01-10
0157|2016-01-11
2228|2016-01-14
Please use below query,
select ID, MAX(APPEARING_DATE) from table_name
group by ID, WEEK_END;
Or, including WEEK)END,
select ID, WEEK_END, MAX(APPEARING_DATE) from table_name
group by ID, WEEK_END;
You can use aggregation:
select t.*, max(week_end)
from t
group by id
having max(week_end) < '2016-01-02';
Adjust the date in the having clause for the week end that you want.
Actually, your question is a bit unclear. I'm not sure if a later week end would keep the row or not. If you want "as of" data, then include a where clause:
select t.id, max(week_end)
from t
where week_end < '2016-01-02'
group by id
having max(week_end) < '2016-01-02';
If you want this for a range of dates, then you can use a derived table:
select we.the_week_end, t.id, max(week_end)
from (select '2016-01-02' as the_week_end union all
select '2016-01-09' as the_week_end
) we cross join
t
where t.week_end < we.the_week_end
group by id, we.the_week_end
having max(t.week_end) < we.the_week_end;

Total Count of Active Employees by Date

I have in the past written queries that give me counts by date (hires, terminations, etc...) as follows:
SELECT per.date_start AS "Date",
COUNT(peo.EMPLOYEE_NUMBER) AS "Hires"
FROM hr.per_all_people_f peo,
hr.per_periods_of_service per
WHERE per.date_start BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
AND per.date_start BETWEEN :PerStart AND :PerEnd
AND per.person_id = peo.person_id
GROUP BY per.date_start
I was now looking to create a count of active employees by date, however I am not sure how I would date the query as I use a range to determine active as such:
SELECT COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo
WHERE peo.current_employee_flag = 'Y'
and TRUNC(sysdate) BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
Here is a simple way to get started. This works for all the effective and end dates in your data:
select thedate,
SUM(num) over (order by thedate) as numActives
from ((select effective_start_date as thedate, 1 as num from hr.per_periods_of_service) union all
(select effective_end_date as thedate, -1 as num from hr.per_periods_of_service)
) dates
It works by adding one person for each start and subtracting one for each end (via num) and doing a cumulative sum. This might have duplicates dates, so you might also do an aggregation to eliminate those duplicates:
select thedate, max(numActives)
from (select thedate,
SUM(num) over (order by thedate) as numActives
from ((select effective_start_date as thedate, 1 as num from hr.per_periods_of_service) union all
(select effective_end_date as thedate, -1 as num from hr.per_periods_of_service)
) dates
) t
group by thedate;
If you really want all dates, then it is best to start with a calendar table, and use a simple variation on your original query:
select c.thedate, count(*) as NumActives
from calendar c left outer join
hr.per_periods_of_service pos
on c.thedate between pos.effective_start_date and pos.effective_end_date
group by c.thedate;
If you want to count all employees who were active during the entire input date range
SELECT COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo
WHERE peo.[EFFECTIVE_START_DATE] <= :StartDate
AND (peo.[EFFECTIVE_END_DATE] IS NULL OR peo.[EFFECTIVE_END_DATE] >= :EndDate)
Here is my example based on Gordon Linoff answer
with a little modification, because in SUBSTRACT table all records were appeared with -1 in NUM, even if no date was in END DATE = NULL.
use AdventureWorksDW2012 --using in MS SSMS for choosing DATABASE to work with
-- and may be not work in other platforms
select
t.thedate
,max(t.numActives) AS "Total Active Employees"
from (
select
dates.thedate
,SUM(dates.num) over (order by dates.thedate) as numActives
from
(
(
select
StartDate as thedate
,1 as num
from DimEmployee
)
union all
(
select
EndDate as thedate
,-1 as num
from DimEmployee
where EndDate IS NOT NULL
)
) AS dates
) AS t
group by thedate
ORDER BY thedate
worked for me, hope it will help somebody
I was able to get the results I was looking for with the following:
--Active Team Members by Date
SELECT "a_date",
COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo,
(SELECT DATE '2012-04-01'-1 + LEVEL AS "a_date"
FROM dual
CONNECT BY LEVEL <= DATE '2012-04-30'+2 - DATE '2012-04-01'-1
)
WHERE peo.current_employee_flag = 'Y'
AND "a_date" BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
GROUP BY "a_date"
ORDER BY "a_date"