Finding the customers that are coming back in a week - sql

Here is my table:
sessid userid date prodcode
xxxxx xx0101 01/01/2020 rpd032
xxxxx xx2021 01/01/2020 xxxx01
xxxxx xx0101 01/01/2020 xx0381
xxxxx xxju23 02/01/2020 xxx023
xxxxx xxjp17 03/01/2020 xxx016
xxxxx xxju23 03/01/2020 xxxx03
xxxxx xx2021 04/01/2020 xxx023
xxxxx xxx270 05/01/2020 xxx023
xxxxx xx0j34 06/01/2020 rpd032
xxxxx xxcj02 07/01/2020 xxx333
xxxxx xxjr04 08/01/2020 rpd032
I want to run a query every week. I might just turn into a procedure later. For now, I want to know the number of customers coming back to the website for the week starting the 02/01/2020. As you can see from the sample above there is only one customer that is coming back (xxju23) so the result of my query should be 1 but I am struggling with it.
select count(userid)
from (
select userid, count(*) as comingbak
from orders
where customers in dateadd(week,7,'02/01/2020')
groupby comingback
having cominback > 1
);

I understand that you are looking for the count of customers that had more than one visit in the website during the week that started on January 2nd.
Consider:
select count(*)
from (
select 1
from orders
where date >= '20200102' and date < dateadd(week, 1, '20200102')
group by userid
having count(*) > 1
) t

Demo on db<>fiddle
You can use datepart(wk, date) to get week in a year.
;with t1 as ( -- Exclude customer comeback in the same date
select distinct userid, date
from #table1
),
t2 as (-- Get week in year
select userid, 'Week ' + cast(datepart(wk, date) as varchar(2)) Week
from t1
)
select userid, Week, count(*) as numberOfVisit -- group by userId and week in year
from t2
group by userid, Week
having count(*) > 1
You can also Count all customer to get the last result.
;with t1 as (
select distinct userid, date
from #table1
),
t2 as (
select userid, 'Week ' + cast(datepart(wk, date) as varchar(2)) Week
from t1
),
t3 as (
select userid, Week, count(*) as numberOfVisit
from t2
group by userid, Week
having count(*) > 1)
select count(*) Total
from t3

Based on GMB's comment. There are some following mistakes (Feel free to correct me if I'm mistaken):
Error syntax: No column name was specified for column 1 of 't'. https://ibb.co/52FBxMc
Condition in Where clause combines with having count(*) > 1 is wrong: You won't get any value >= '20200102'. It should be value >= '20200101'
You will get xx0101 as well. However, it should be excluded as back in the same day https://ibb.co/YtbCL1z
You should select userId or something like that instead of 1 as it makes confuse
Your condition just works in the time range 20200101 while it should be dynamic.
In short, the answer to #Phong might be more suitable.

Related

How to find the number of occurences within a date range?

Let's say I have hospital visits in the table TestData
I would like to know which patients have had a second hospital visit within 7 days of their first hospital visit.
How would I code this in SQL?
I have patient_id as a TEXT
the date is date_visit is also TEXT and takes the format MM/DD/YYYY
patient_id
date_visit
A123B29133
07/12/2011
A123B29133
07/14/2011
A123B29133
07/20/2011
A123B29134
12/05/2016
In the above table patient A123B29133 fulfills the condition as they were seen on 07/14/2011 which is less that 7 days from 07/12/2011
You can use a subquery with exists:
with to_d(id, v_date) as (
select patient_id, substr(date_visit, 7, 4)||"-"||substr(date_visit, 1, 2)||"-"||substr(date_visit, 4, 2) from visits
)
select t2.id from (select t1.id, min(t1.v_date) d1 from to_d t1 group by t1.id) t2
where exists (select 1 from to_d t3 where t3.id = t2.id and t3.v_date != t2.d1 and t3.v_date <= date(t2.d1, '+7 days'))
id
A123B29133
Since your date column is not in YYYY-MM-DD which is the default value used by several sqlite date functions, the substr function was used to transform your date in this format. JulianDay was then used to convert your dates to an integer value which would ease the comparison of 7 days. The MIN window function was used to identify the first hospital visit date for that patient. The demo fiddle and samples show the query that was used to transform the data and the results before the final query which filters based on your requirements i.e. < 7 days. With this approach using window functions, you may also retrieve the visit_date and the number of days since the first visit date if desired.
You may read more about sqlite date functions here.
Query #1
SELECT
patient_id,
visit_date,
JulianDay(visit_date) -
MIN(JulianDay(visit_date)) OVER (PARTITION BY patient_id)
as num_of_days_since_first_visit
FROM
(
SELECT
*,
(
substr(date_visit,7) || '-' ||
substr(date_visit,0,3) || '-' ||
substr(date_visit,4,2)
) as visit_date
FROM
visits
) v;
patient_id
visit_date
num_of_days_since_first_visit
A123B29133
2011-07-12
0
A123B29133
2011-07-14
2
A123B29133
2011-07-20
8
A123B29134
2016-12-05
0
Query #2
The below is your desired query, which uses the previous query as a CTE and applies the filter for visits less than 7 days. num_of_days <> 0 is applied to remove entries where the first date is also the date of the record.
WITH num_of_days_since_first_visit AS (
SELECT
patient_id,
visit_date,
JulianDay(visit_date) - MIN(JulianDay(visit_date)) OVER (PARTITION BY patient_id) num_of_days
FROM
(
SELECT
*,
(
substr(date_visit,7) || '-' ||
substr(date_visit,0,3) || '-' ||
substr(date_visit,4,2)
) as visit_date
FROM
visits
) v
)
SELECT DISTINCT
patient_id
FROM
num_of_days_since_first_visit
WHERE
num_of_days <> 0 AND num_of_days < 7;
patient_id
A123B29133
View on DB Fiddle
Let me know if this works for you.
I would like to know which patients have had a second hospital visit within 7 days of their first hospital visit.
You can use lag(). The following gets all rows where this is true:
select t.*
from (select t.*,
lag(date_visit) over (partition by patient_id order by date_visit) as prev_date_visit
from t
) t
where prev_date_visit >= date(date_visit, '-7 day');
If you just want the patient_ids, you can use select distinct patient_id.

Cumulative count over weeks in SQL

I have table of items with owner ids referencing to a user from users table.
I want to show for each week (group by week) how many items were created that week per user + all the items created before - cumulative count.
For this table:
id
owner
created
1
xxxxx
'2021-01-01'
2
xxxxx
'2021-01-01'
3
xxxxx
'2021-01-09'
I want to get:
count
owner
week
2
xxxxx
'2021-01-01' - '2021-01-07'
3
xxxxx
'2021-01-08' - '2021-01-14'
This is code for non-cumulative count. How can I change it to be cumulative?
select
count(*),
uu.id,
date_trunc('week', CAST(it.created AS timestamp)) as week
from items it
left join users uu on uu.id = item.owner_id
group by uu.id, week
I'm a little confused by your query:
You have a left join from items to users as if you expect some items with no valid user id.
You are using u.id in the select, but that would be NULL with no match.
I would suggest:
select it.owner_id,
date_trunc('week', it.created::timestamp) as week_start,
date_trunc('week', it.created::timestamp) + interval '6 day' as week_end,
count(*) as this_week,
sum(count(*)) over (partition by uu.id order by min(timestamp)) as running_count
from items it
group by it.owner_id, week_start;
This uses Postgres syntax because your code looks like Postgres.
Remove user id from the GROUP BY clause and from SELECT list:
select
count(*),
date_trunc('week', CAST(it.created AS timestamp)) as week
from items it
left join users uu on uu.id = item.owner_id
group by week
here's a little runnable sample (SQL Server), maybe it will help:
create table #temp (week int, cnt int)
select * from #temp
insert into #temp select 1,2
insert into #temp select 1,1
insert into #temp select 2,3
insert into #temp select 3,3
select
week,
sum(count(*)) over (order by week) as runningCnt
from #temp
group by week
The output is:
week - runningCnt
1 - 3
2 - 5
3 - 6
So 1st week there were 3, next week there came 2 more, and last week one more.
You could also do a cumulative sum of the values in the cnt-column.

Selecting the difference between dates in a stored procedure using a subquery

I can't get my head around whether this is even possible, but I feel like I might have done it before and lost that bit of code. I am trying to craft a select statement that contains an inner join on a subquery to show the number of days between two dates from the same table.
A simple example of the data structure would look like:
Name ID Date Day Hours
Bill 1 3/3/20 Thursday 8
Fred 2 4/3/20 Monday 6
Bill 1 8/3/20 Tuesday 2
Based on this data, I want to select each row plus an extra column which is the number of days between the date from each row for each ID. Something like:
Select * from tblData
Inner join (datediff(Select Top(1) Date from tblData where Date < Date), Date) And ID = ID)
or for simplicity:
Select * from tblData
Inner join (datediff(Select Top(1) Date from tblData where Date < 8/3/20), 8/3/20) And ID = 1)
The resulting dataset would look like:
Name ID Date Day Hours DaysBtwn
Bill 1 3/3/20 Thursday 8 4 (Assuming there was an earlier row in the table)
Fred 2 4/3/20 Monday 6 5 (Assuming there was an earlier row in the table)
Bill 1 8/3/20 Tuesday 2 5 (Based on the previous row date being 3/3/20 for Bill)
Does this make sense and am I trying to do this the wrong way? I want to do this for about 600000 rows in table and therefore efficiency is the key, so if there is a better way to do this, i'm open to suggestions.
You can use lag():
select t.*, datediff(day, lag(date) over(partition by id order by date), date) diff
from mytable t
I think you just want lag():
select t.*,
datediff(day,
lag(date) over (partition by name order by date),
date
) as diff
from tblData t;
Note: If you want to filter the data so rows in the result set are used for the lag() but not in the result set, then use a subquery:
select t.*
from (select t.*,
datediff(day,
lag(date) over (partition by name order by date),
date
) as diff
from tblData t
) t
where date < '2020-08-03';
Also note the use of the date constant as a string in YYYY-MM-DD format.

Counting an already counted column in SQL (db2)

I'm pretty new to SQL and have this problem:
I have a filled table with a date column and other not interesting columns.
date | name | name2
2015-03-20 | peter | pan
2015-03-20 | john | wick
2015-03-18 | harry | potter
What im doing right now is counting everything for a date
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
what i want to do now is counting the resulting lines and only returning them if there are less then 10 resulting lines.
What i tried so far is surrounding the whole query with a temp table and the counting everything which gives me the number of resulting lines (yeah)
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
)
select count(*)
from temp_count
What is still missing the check if the number is smaller then 10.
I was searching in this Forum and came across some "having" structs to use, but that forced me to use a "group by", which i can't.
I was thinking about something like this :
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
)
select *
from temp_count
having count(*) < 10
maybe im too tired to think of an easy solution, but i can't solve this so far
Edit: A picture for clarification since my english is horrible
http://imgur.com/1O6zwoh
I want to see the 2 columned results ONLY IF there are less then 10 rows overall
I think you just need to move your having clause to the inner query so that it is paired with the GROUP BY:
with temp_count (date, counter) as
(
select date, count(*)
from testtable
where date >= current date - 10 days
group by date
having count(*) < 10
)
select *
from temp_count
If what you want is to know whether the total # of records (after grouping), are returned, then you could do this:
with temp_count (date, counter) as
(
select date, counter=count(*)
from testtable
where date >= current date - 10 days
group by date
)
select date, counter
from (
select date, counter, rseq=row_number() over (order by date)
from temp_count
) x
group by date, counter
having max(rseq) >= 10
This will return 0 rows if there are less than 10 total, and will deliver ALL the results if there are 10 or more (you can just get the first 10 rows if needed with this also).
In your temp_count table, you can filter results with the WHERE clause:
with temp_count (date, counter) as
(
select date, count(distinct date)
from testtable
where date >= current date - 10 days
group by date
)
select *
from temp_count
where counter < 10
Something like:
with t(dt, rn, cnt) as (
select dt, row_number() over (order by dt) as rn
, count(1) as cnt
from testtable
where dt >= current date - 10 days
group by dt
)
select dt, cnt
from t where 10 >= (select max(rn) from t);
will do what you want (I think)

sql count statement with multiple date ranges

I have two table with different appointment dates.
Table 1
id start date
1 5/1/14
2 3/2/14
3 4/5/14
4 9/6/14
5 10/7/14
Table 2
id start date
1 4/7/14
1 4/10/14
1 7/11/13
2 2/6/14
2 2/7/14
3 1/1/14
3 1/2/14
3 1/3/14
If i had set date ranges i can count each appointment date just fine but i need to change the date ranges.
For each id in table 1 I need to add the distinct appointment dates from table 2 BUT only
6 months prior to the start date from table 1.
Example: count all distinct appointment dates for id 1 (in table 2) with appointment dates between 12/1/13 and 5/1/14 (6 months prior). So the result is 2...4/7/14 and 4/10/14 are within and 7/1/13 is outside of 6 months.
So my issue is that the range changes for each record and i can not seem to figure out how to code this.For id 2 the date range will be 9/1/14-3/2/14 and so on.
Thanks everyone in advance!
Try this out:
SELECT id,
(
SELECT COUNT(*)
FROM table2
WHERE id = table1.id
AND table2.start_date >= DATEADD(MM,-6,table1.start_date)
) AS table2records
FROM table1
The DATEADD subtracts 6 months from the date in table1 and the subquery returns the count of related records.
I think what you want is a type of join.
select t1.id, count(t2.id) as numt2dates
from table1 t1 left outer join
table2 t2
on t1.id = t2.id and
t2.startdate between dateadd(month, -6, t1.startdate) and t1.startdate
group by t1.id;
The exact syntax for the date arithmetic depends on the database.
Thank you this solved my issue. Although this may not help you since you are not attempting to group by date. But the answer gave me the insights to resolve the issue I was facing.
I was attempting to gather the total users a date criteria that had to be evaluated by multiple fields.
WITH data AS (
SELECT generate_series(
(date '2020-01-01')::timestamp,
NOW(),
INTERVAL '1 week'
) AS date
)
SELECT d.date, (SELECT COUNT(DISTINCT h.id) AS user_count
FROM history h WHERE h.startDate < d.date AND h.endDate > d.date
ORDER BY 1 DESC) AS total_records
FROM data d ORDER BY d.date DESC
2022-05-16, 15
2022-05-09, 13
2022-05-02, 13
...