sql count statement with multiple date ranges - sql

I have two table with different appointment dates.
Table 1
id start date
1 5/1/14
2 3/2/14
3 4/5/14
4 9/6/14
5 10/7/14
Table 2
id start date
1 4/7/14
1 4/10/14
1 7/11/13
2 2/6/14
2 2/7/14
3 1/1/14
3 1/2/14
3 1/3/14
If i had set date ranges i can count each appointment date just fine but i need to change the date ranges.
For each id in table 1 I need to add the distinct appointment dates from table 2 BUT only
6 months prior to the start date from table 1.
Example: count all distinct appointment dates for id 1 (in table 2) with appointment dates between 12/1/13 and 5/1/14 (6 months prior). So the result is 2...4/7/14 and 4/10/14 are within and 7/1/13 is outside of 6 months.
So my issue is that the range changes for each record and i can not seem to figure out how to code this.For id 2 the date range will be 9/1/14-3/2/14 and so on.
Thanks everyone in advance!

Try this out:
SELECT id,
(
SELECT COUNT(*)
FROM table2
WHERE id = table1.id
AND table2.start_date >= DATEADD(MM,-6,table1.start_date)
) AS table2records
FROM table1
The DATEADD subtracts 6 months from the date in table1 and the subquery returns the count of related records.

I think what you want is a type of join.
select t1.id, count(t2.id) as numt2dates
from table1 t1 left outer join
table2 t2
on t1.id = t2.id and
t2.startdate between dateadd(month, -6, t1.startdate) and t1.startdate
group by t1.id;
The exact syntax for the date arithmetic depends on the database.

Thank you this solved my issue. Although this may not help you since you are not attempting to group by date. But the answer gave me the insights to resolve the issue I was facing.
I was attempting to gather the total users a date criteria that had to be evaluated by multiple fields.
WITH data AS (
SELECT generate_series(
(date '2020-01-01')::timestamp,
NOW(),
INTERVAL '1 week'
) AS date
)
SELECT d.date, (SELECT COUNT(DISTINCT h.id) AS user_count
FROM history h WHERE h.startDate < d.date AND h.endDate > d.date
ORDER BY 1 DESC) AS total_records
FROM data d ORDER BY d.date DESC
2022-05-16, 15
2022-05-09, 13
2022-05-02, 13
...

Related

SQL - Get historic count of rows collected within a certain period by date

For many years I've been collecting data and I'm interested in knowing the historic counts of IDs that appeared in the last 30 days. The source looks like this
id
dates
1
2002-01-01
2
2002-01-01
3
2002-01-01
...
...
3
2023-01-10
If I wanted to know the historic count of ids that appeared in the last 30 days I would do something like this
with total_counter as (
select id, count(id) counts
from source
group by id
),
unique_obs as (
select id
from source
where dates >= DATEADD(Day ,-30, current_date)
group by id
)
select count(distinct(id))
from unique_obs
left join total_counter
on total_counter.id = unique_obs.id;
The problem is that this results would return a single result for today's count as provided by current_date.
I would like to see a table with such counts as if for example I had ran this analysis yesterday, and the day before and so on. So the expected result would be something like
counts
date
1235
2023-01-10
1234
2023-01-09
1265
2023-01-08
...
...
7383
2022-12-11
so for example, let's say that if the current_date was 2023-01-10, my query would've returned 1235.
If you need a distinct count of Ids from the 30 days up to and including each date the below should work
WITH CTE_DATES
AS
(
--Create a list of anchor dates
SELECT DISTINCT
dates
FROM source
)
SELECT COUNT(DISTINCT s.id) AS "counts"
,D.dates AS "date"
FROM CTE_DATES D
LEFT JOIN source S ON S.dates BETWEEN DATEADD(DAY,-29,D.dates) AND D.dates --30 DAYS INCLUSIVE
GROUP BY D.dates
ORDER BY D.dates DESC
;
If the distinct count didnt matter you could likely simplify with a rolling sum, only hitting the source table once:
SELECT S.dates AS "date"
,COUNT(1) AS "count_daily"
,SUM("count_daily") OVER(ORDER BY S.dates DESC ROWS BETWEEN CURRENT ROW AND 29 FOLLOWING) AS "count_rolling" --assumes there is at least one row for every day.
FROM source S
GROUP BY S.dates
ORDER BY S.dates DESC;
;
This wont work though if you have gaps in your list of dates as it'll just include the latest 30 days available. In which case the first example without distinct in the count will do the trick.
SELECT count(*) AS Counts
dates AS Date
FROM source
WHERE dates >= DATEADD(DAY, -30, CURRENT_DATE)
GROUP BY dates
ORDER BY dates DESC

SQL Retention Rates

I am trying to construct a rolling retention measure but am having troubles figuring how to do it in redshift.
I have defined retention as the intersection between two sets of users. The first a cohort of distinct user ids who 90 days from todays date had been active at least once 30 days from that date (between 90 and 120 days from today). The second is the number of those users who were active in the last 30 days from today.
Retention = Todays 30 day active users who were in original cohort 90 days ago / 30 day active suers 90 days ago
My sessions table looks like this:
id
created_date
1
2021-03-04
1
2021-01-01
1
2020-12-15
2
2021-02-17
The only way I can seem to do this is as follows:
Create a temple table and insert into for todays date.
with t1 as (
select distinct customer_id id
from sessions
and created_date >= dateadd('day', -29, current_date)
)
, t2 as (
select distinct customer_id id
from sessions
and created_date <= dateadd('day', -89, current_date)
and created_date >= dateadd('day', -119, current_date)
)
select current_date,
count(t1.id) as original,
count(t2.id) as current,
round(cast(count(t2.id) as float) / cast(count(t1.id) as float), 2) as ratio
into temp table temp1
from t1
left join t2
on t1.id = t2.id
Run an insert statement into the temp table multiple times subtracting one day from current date in each query
insert into temp1
with t1 as (
select distinct customer_id id
from sessions
and created_date >= dateadd('day', -29, current_date-1)
)
, t2 as (
select distinct customer_id id
from sessions
and created_date <= dateadd('day', -89, current_date-1)
and created_date >= dateadd('day', -119, current_date-1)
)
select current_date-1,
count(t1.id) as original,
count(t2.id) as current,
round(cast(count(t2.id) as float) / cast(count(t1.id) as float), 2) as ratio
from t1
left join t2
on t1.id = t2.id
Obtain this table with a daily retention rate for all days so far in 2021
The column original is the user cohort of 30 day active users 90 days ago from the reference date.
The current column is the number of users from the cohort in the original column that are 30 day active users at the reference date.
Step 1 returns only the first row 2021-03-05 and step 2 gives me the other row.
date
original
current
ratio
2021-03-05
100
70
0.7
2021-03-04
100
60
0.6
This process obviously is very inefficient and I am trying to figure out whether there is faster, easier way to do it? The issue is I need to compare a distinct user cohort from 3 months ago and then see today how many of those users from the cohort are still active.
All hep will be greatly appreciated!
If you want to get the number of users for 30 days today and 90 days ago for each date, the query is:
with t1 as (
select
s2.created_date,
count(distinct customer_id id) as cnt30
from sessions s1 inner join
(select distinct created_date from sessions) s2
on dateadd('day', -29, s2.created_date)<=s1.created_date
and s1.created_date<=s2.created_date
group by s2.created_date
)
select a1.current_date,
a1.cnt30 as original,
a2.cnt32 as current,
round(cast(a2.cnt30) as float) / cast(count(a1.cnt30) as float), 2) as ratio
from t1 as a1 inner join t1 as a2
on dateadd('day', -89, a1.created_date)=a2.created_date
order by 1
Using the subquery in the select-list, the query is:
with t1 as (
select
s2.created_date,
(select count(distinct s1.customer_id) from sessions s1
where dateadd('day', -29, s2.created_date)<=s1.created_date
and s1.created_date<=s2.created_date) as cnt30
from
(select distinct created_date from sessions) s2
)
select a1.current_date,
a1.cnt30 as original,
a2.cnt32 as current,
round(cast(a2.cnt30) as float) / cast(count(a1.cnt30) as float), 2) as ratio
from t1 as a1 inner join t1 as a2
on dateadd('day', -89, a1.created_date)=a2.created_date
order by 1
First, use joins and subqueries to calculate the number of unique IDs for the last 30 days on each date.
Next, join the same tables and output the number of unique IDs on the current day and 90 days ago.
Note taht I've never used redshift, so I'll write this based on your query and common SQL syntax. I hope my answer helps you.

How to select all dates in SQL query

SELECT oi.created_at, count(oi.id_order_item)
FROM order_item oi
The result is the follwoing:
2016-05-05 1562
2016-05-06 3865
2016-05-09 1
...etc
The problem is that I need information for all days even if there were no id_order_item for this date.
Expected result:
Date Quantity
2016-05-05 1562
2016-05-06 3865
2016-05-07 0
2016-05-08 0
2016-05-09 1
You can't count something that is not in the database. So you need to generate the missing dates in order to be able to "count" them.
SELECT d.dt, count(oi.id_order_item)
FROM (
select dt::date
from generate_series(
(select min(created_at) from order_item),
(select max(created_at) from order_item), interval '1' day) as x (dt)
) d
left join order_item oi on oi.created_at = d.dt
group by d.dt
order by d.dt;
The query gets the minimum and maximum date form the existing order items.
If you want the count for a specific date range you can remove the sub-selects:
SELECT d.dt, count(oi.id_order_item)
FROM (
select dt::date
from generate_series(date '2016-05-01', date '2016-05-31', interval '1' day) as x (dt)
) d
left join order_item oi on oi.created_at = d.dt
group by d.dt
order by d.dt;
SQLFiddle: http://sqlfiddle.com/#!15/49024/5
Friend, Postgresql Count function ignores Null values. It literally does not consider null values in the column you are searching. For this reason you need to include oi.created_at in a Group By clause
PostgreSql searches row by row sequentially. Because an integral part of your query is Count, and count basically stops the query for that row, your dates with null id_order_item are being ignored. If you group by oi.created_at this column will trump the count and return 0 values for you.
SELECT oi.created_at, count(oi.id_order_item)
FROM order_item oi
Group by io.created_at
From TechontheNet (my most trusted source of information):
Because you have listed one column in your SELECT statement that is not encapsulated in the count function, you must use a GROUP BY clause. The department field must, therefore, be listed in the GROUP BY section.
Some info on Count in PostgreSql
http://www.postgresqltutorial.com/postgresql-count-function/
http://www.techonthenet.com/postgresql/functions/count.php
Solution #1 You need Date Table where you stored all date data. Then do a left join depending on period.
Solution #2
WITH DateTable AS
(
SELECT DATEADD(dd, 1, CONVERT(DATETIME, GETDATE())) AS CreateDateTime, 1 AS Cnter
UNION ALL
SELECT DATEADD(dd, -1, CreateDateTime), DateTable.Cnter + 1
FROM DateTable
WHERE DateTable.Cnter + 1 <= 5
)
Generate Temporary table based on your input and then do a left Join.

Oracle SQL to get count of first occurrences of a row in the full data set filtered by date?

We have a dataset that looks like this:
ID eventType date
--------------------------------
1 foo 2 March 2013
2 foo 3 March 2013
3 bar 3 March 2013
4 foo 5 March 2013
5 foo 6 March 2013
6 bar 7 March 2013
7 baz 8 March 2013
I can easily get the unique list of eventTypes from this list. However, how do I query the count of which eventTypes first appeared BETWEEN startDate and endDate. I want to be able to have a date range from 7 March 2013 - 10 March 2013 and get a count returned of 1 since baz value was a newly occurring eventType during the date range. On the other hand, a date range of 5 March 2013 - 7 March 2013 would return a count of 0 since no newly occurring eventTypes appeared in that range.
I would just look at using the MIN aggregate function to find the earliest occurrence of the event type before the end date of the query. Then I would see if any events had their earliest event after the start date of the range.
SELECT event_type, date_value
from (
SELECT event_type, min(date_value) as date_value
from your_table
where date_value <= date '2013-03-10'
group by event_type
)
where date_value >= date '2013-03-07'
I think this should work for you. Basically join the table against itself:
SELECT COUNT(DISTINCT T.EventType)
FROM YourTable T
LEFT JOIN YourTable T2 ON T.eventType = T2.eventType AND T2.dateField < to_date('2013-03-07','yyyy-mm-dd')
WHERE T.DateField BETWEEN to_date('2013-03-07','yyyy-mm-dd')
AND to_date('2013-03-10','yyyy-mm-dd')
AND T2.Id IS NULL
And here is the SQL Fiddle.
-- EDIT
As #JoachimIsaksson correctly pointed out, you can just as easily (and probably preferably), change the LEFT JOIN to:
LEFT JOIN YourTable T2 ON T.eventType = T2.eventType AND T2.dateField < T.dateField
Good luck.
Something like this should do the trick:
SELECT COUNT(DISTINCT eventType)
FROM YOUR_TABLE T1
WHERE
date BETWEEN :startDate AND :endDate
AND NOT EXISTS (
SELECT *
FROM YOUR_TABLE T2
WHERE T1.eventType = T2.eventType AND T2.date < :startDate
)
In plain English:
Exclude all rows whose eventType already exists before the given date.
And then just count the distinct occurrences of what's left in the given date range.
Another way to express the same would be using MINUS:
SELECT COUNT(*)
FROM (
SELECT eventType
FROM your_table
WHERE date BETWEEN :startDate AND :endDate
MINUS
SELECT eventType
FROM your_table
WHERE date < :startDate
);
Note: COUNT(DISTINCT) isn't required in this case because MINUS implies DISTINCT, i.e. the left side of MINUS will return only unique entries.
I do not fully understand your data and question but I think you may use analytic functions to partition your data by dates between/range_between and event type. Then you'd have rownum() or rank()/dense_rank() in your partitions which would be your sequence. The lowest seq. maybe your answer.

how to get last date form DB table mysql

i have this table in my DB
categoriesSupports-> id, category_id, support_id, date
the thing is that i need to extract all support_id where date is the closest date from now...
something like this... if there is in the DB table
id, category_id, support_id, date
1 1 1 2010-11-23
2 1 2 2010-11-25
3 1 1 2010-11-26
4 1 3 2010-11-24
i need to get just
id, category_id, support_id, date
2 1 2 2010-11-25
3 1 1 2010-11-26
4 1 3 2010-11-24
So for better undestanding... i need the closest date for each support and i only have date from the past...
Ive being trying a lot and I dont know how...
The following should give you:
all the categoriesSupports for current date(one or multiple)
One previous categoriesSupport(if exists)
One future categoriesSupport(if exists)
(
SELECT *
FROM `categoriesSupports`
WHERE `date` < CURDATE()
ORDER BY `date` DESC
LIMIT 1
)
UNION
(
SELECT *
FROM `categoriesSupports`
WHERE `date` = CURDATE()
)
UNION
(
SELECT *
FROM `categoriesSupports`
WHERE `date` > CURDATE()
ORDER BY `date` ASC
LIMIT 1
)
A. This answers 'where date is the closest date from now...':
SELECT *
FROM `categoriesSupports`
WHERE `date` IN (
SELECT `date`
FROM `categoriesSupports`
ORDER BY `date` DESC
LIMIT 1
)
Notes:
You can set LIMIT n to select entries for more dates.
If you only want for the last date you can replace IN with = because the sub-select will return only one value.
If your table includes future dates replace ORDER BY date DESC with ORDER BY ABS(NOW() - date) ASC.
A solution with JOINS. Will work only if you have past dates.
SELECT a.*
FROM `categoriesSupports` AS a
LEFT JOIN `categoriesSupports` AS b
ON b.date > a.date
WHERE b.id IS NULL
Added just for reference.
B. This answers 'where date is in the last 3 days (including today)':
SELECT *
FROM `categoriesSupports`
WHERE DATEDIFF(NOW(), `date`) < 3
Replace 3 with any number if you want more or less days.
C. Same as A., but per support id
SELECT a.*
FROM `categoriesSupports` AS a
LEFT JOIN `categoriesSupports` AS b
ON b.support_id = a.support_id AND b.date > a.date
WHERE b.id IS NULL
This answers the latest version of the question.
SELECT *
FROM CurrentDeals
WHERE (julianday(Date('now')) - julianday(date))<=3
ORDER BY date ASC
Here, you have to decide what would be your meaning of "closest". I have used 3 as the sample. This will list out the records, which has a date value lesser that or equal to 3.
Hope this is what you wanted.