Logical error in selecting rows with correct output - sql

I understand the basics but I am new to DBMSs and I'm learning in a course.
Here is the assignment question:
Write a query to display the number of sales that were made in the last 40 months with the below table:
SALEID SID SLDATE
1001 1 01-JAN-14
1002 5 02-JAN-14
1003 4 01-FEB-14
1004 1 01-MAR-14
1005 2 01-FEB-14
1006 1 01-JUN-15
My query is:
select count(sldate) as sale_count
from sale
where sldate >= add_months(sysdate, -40)
The output expected and that I get is:
SALE_COUNT
0
But I get an error message:
Error: Your query output matches expected result, but there are logical errors.
I'm not sure where I got the logic wrong.

The 'last 40 months' is ambiguous.
There are several interpretation of 'what does last n months from date x mean' and oracle''s add_months does not have monopoly on that (in fact most people would say it doesn't work as expected, just wait till the 30.06 and ask somebody 'what was the date a month ago' :) )
Imagine today is 20 of april.
Does 'last month' include 15, 20, 21, or 25 of march?
Does it include 02 of april?
That depends, someone could say that 'last month' is from 21 of march to 20 of april.
Someone could say, that 'last month' from 01.03 to today.
Someone could say, that 'last month' starts from 01.04.
Someone could say, that 'last month' means whole march, but not a single day of april.
It gets even trickier when 'today' is close to the end of the month, especially in march.
Don't be hard on yourself just because you couldn't read mind of someone who wrote the assignment ;)
I've wrote a query showing how different approches might yield different results.
CREATE OR REPLACE FUNCTION temp_can_subst_interval_months(p_date date, p_n_of_months number) RETURN NUMBER AS
V_date DATE;
BEGIN
V_Date := p_date - (NUMTOYMINTERVAL(p_n_of_months, 'month'));
RETURN 1;
EXCEPTION
WHEN OTHERS THEN
RETURN 0;
END;
with all_days as (
select to_date('2016-01-01', 'YYYY-MM-DD') + (level - 1) as d
from dual
connect by level < 1462
),
all_days_2 as (
select d date_of_query_being_run,
add_months(d, -40)as min_date_your_approach,
add_months(d, -40) + 1 as min_date_your_approach_2, -- same, but exclude the first day
trunc(add_months(d, -40), 'mm') as min_date_whole_month,
case when temp_can_subst_interval_months(d, 40) = 1 then
d - (interval '40' month)
else null
end as min_date_interval_approach
from all_days ad
order by ad.d
)
select ads.*
from all_days_2 ads
;
The most interesting results are when your approach differs from interval approach:
1 (sysdate) 2 (yours) 3 4 5 (interval)
31.01.2016 30.09.2012 01.10.2012 01.09.2012
29.02.2016 31.10.2012 01.11.2012 01.10.2012 29.10.2012
31.03.2016 30.11.2012 01.12.2012 01.11.2012
30.04.2016 31.12.2012 01.01.2013 01.12.2012 30.12.2012
29.06.2016 28.02.2013 01.03.2013 01.02.2013
30.06.2016 28.02.2013 01.03.2013 01.02.2013
31.08.2016 30.04.2013 01.05.2013 01.04.2013
30.09.2016 31.05.2013 01.06.2013 01.05.2013 30.05.2013
31.10.2016 30.06.2013 01.07.2013 01.06.2013
30.11.2016 31.07.2013 01.08.2013 01.07.2013 30.07.2013
31.01.2017 30.09.2013 01.10.2013 01.09.2013
28.02.2017 31.10.2013 01.11.2013 01.10.2013 28.10.2013
31.03.2017 30.11.2013 01.12.2013 01.11.2013
30.04.2017 31.12.2013 01.01.2014 01.12.2013 30.12.2013
29.06.2017 28.02.2014 01.03.2014 01.02.2014
30.06.2017 28.02.2014 01.03.2014 01.02.2014
31.08.2017 30.04.2014 01.05.2014 01.04.2014
30.09.2017 31.05.2014 01.06.2014 01.05.2014 30.05.2014
31.10.2017 30.06.2014 01.07.2014 01.06.2014
30.11.2017 31.07.2014 01.08.2014 01.07.2014 30.07.2014
31.01.2018 30.09.2014 01.10.2014 01.09.2014
28.02.2018 31.10.2014 01.11.2014 01.10.2014 28.10.2014
31.03.2018 30.11.2014 01.12.2014 01.11.2014
30.04.2018 31.12.2014 01.01.2015 01.12.2014 30.12.2014
29.06.2018 28.02.2015 01.03.2015 01.02.2015
30.06.2018 28.02.2015 01.03.2015 01.02.2015
31.08.2018 30.04.2015 01.05.2015 01.04.2015
30.09.2018 31.05.2015 01.06.2015 01.05.2015 30.05.2015
31.10.2018 30.06.2015 01.07.2015 01.06.2015
30.11.2018 31.07.2015 01.08.2015 01.07.2015 30.07.2015
31.01.2019 30.09.2015 01.10.2015 01.09.2015
28.02.2019 31.10.2015 01.11.2015 01.10.2015 28.10.2015
31.03.2019 30.11.2015 01.12.2015 01.11.2015
30.04.2019 31.12.2015 01.01.2016 01.12.2015 30.12.2015
30.06.2019 29.02.2016 01.03.2016 01.02.2016
31.08.2019 30.04.2016 01.05.2016 01.04.2016
30.09.2019 31.05.2016 01.06.2016 01.05.2016 30.05.2016
31.10.2019 30.06.2016 01.07.2016 01.06.2016
30.11.2019 31.07.2016 01.08.2016 01.07.2016 30.07.2016
Side note:
Maybe the table contains something like 'future expected sales' and they want you to filter out dates later than sysdate ;)?

Related

count number of records by month over the last five years where record date > select month

I need to show the number of valid inspectors we have by month over the last five years. Inspectors are considered valid when the expiration date on their certification has not yet passed, recorded as the month end date. The below SQL code is text of the query to count valid inspectors for January 2017:
SELECT Count(*) AS RecordCount
FROM dbo_Insp_Type
WHERE (dbo_Insp_Type.CERT_EXP_DTE)>=#2/1/2017#);
Rather than designing 60 queries, one for each month, and compiling the results in a final table (or, err, query) are there other methods I can use that call for less manual input?
From this sample:
Id
CERT_EXP_DTE
1
2022-01-15
2
2022-01-23
3
2022-02-01
4
2022-02-03
5
2022-05-01
6
2022-06-06
7
2022-06-07
8
2022-07-21
9
2022-02-20
10
2021-11-05
11
2021-12-01
12
2021-12-24
this single query:
SELECT
Format([CERT_EXP_DTE],"yyyy/mm") AS YearMonth,
Count(*) AS AllInspectors,
Sum(Abs([CERT_EXP_DTE] >= DateSerial(Year([CERT_EXP_DTE]), Month([CERT_EXP_DTE]), 2))) AS ValidInspectors
FROM
dbo_Insp_Type
GROUP BY
Format([CERT_EXP_DTE],"yyyy/mm");
will return:
YearMonth
AllInspectors
ValidInspectors
2021-11
1
1
2021-12
2
1
2022-01
2
2
2022-02
3
2
2022-05
1
0
2022-06
2
2
2022-07
1
1
ID
Cert_Iss_Dte
Cert_Exp_Dte
1
1/15/2020
1/15/2022
2
1/23/2020
1/23/2022
3
2/1/2020
2/1/2022
4
2/3/2020
2/3/2022
5
5/1/2020
5/1/2022
6
6/6/2020
6/6/2022
7
6/7/2020
6/7/2022
8
7/21/2020
7/21/2022
9
2/20/2020
2/20/2022
10
11/5/2021
11/5/2023
11
12/1/2021
12/1/2023
12
12/24/2021
12/24/2023
A UNION query could calculate a record for each of 50 months but since you want 60, UNION is out.
Or a query with 60 calculated fields using IIf() and Count() referencing a textbox on form for start date:
SELECT Count(IIf(CERT_EXP_DTE>=Forms!formname!tbxDate,1,Null)) AS Dt1,
Count(IIf(CERT_EXP_DTE>=DateAdd("m",1,Forms!formname!tbxDate),1,Null) AS Dt2,
...
FROM dbo_Insp_Type
Using the above data, following is output for Feb and Mar 2022. I did a test with Cert_Iss_Dte included in criteria and it did not make a difference for this sample data.
Dt1
Dt2
10
8
Or a report with 60 textboxes and each calls a DCount() expression with criteria same as used in query.
Or a VBA procedure that writes data to a 'temp' table.

SQL last 6 months visits

Purpose of the report: Identify patients who did not have dental cleanings in the last 6 months
What would be the best approach to write a sql script?
Patients table
patient_id
patient_name
11
Jason Strong
22
Ryan Smith
33
Casey Hammer
Visits table
v_id
patient_id
reason_visit
date_of_visit
1
11
medical
01/01/2021
2
22
dental cleaning
11/10/2020
3
22
annual
01/01/2021
4
11
dental cleaning
5/10/2021
5
11
annual
5/1/2021
Expected
patient_id
patient_name
22
Ryan Smith
33
Casey Hammer
Casey is on the list because she is not in the visits table meaning she never received a cleaning from our office.
Ryan Smith is on the list because it is time for his cleaning.
I was also thinking what if the patient did not have an appointment in the last 6 months but had an future appointment for dental cleaning. I would want to exclude that.
in postgresql:
select * from Patients p
where not exists (
select 1 from Visits v
where v.patient_id = p.patient_id
and reason_visit = 'dental cleaning'
and date_of_visit < now() - interval '6 month'
)
in sql server replace now() - interval '6 month' with dateadd(month, -6,getdate())
in mysql date_add(now(), interval -6 month)

Calculate Churn by aggregating by date range in SQL

I am trying to calculate the churn rate from a data that has customer_id, group, date. The aggregation is going to be by id, group and date. The churn formula is (customers in previous cohort - customers in last cohort)/customers in previous cohort
customers in previous cohort refers to cohorts in before 28 days
customers in last cohort refers to cohorts in last 28 days
I am not sure how to aggregate them by date range to calculate the churn.
Here is sample data that I copied from SQL Group by Date Range:
Date Group Customer_id
2014-03-01 A 1
2014-04-02 A 2
2014-04-03 A 3
2014-05-04 A 3
2014-05-05 A 6
2015-08-06 A 1
2015-08-07 A 2
2014-08-29 XXXX 2
2014-08-09 XXXX 3
2014-08-10 BB 4
2014-08-11 CCC 3
2015-08-12 CCC 2
2015-03-13 CCC 3
2014-04-14 CCC 5
2014-04-19 CCC 4
2014-08-16 CCC 5
2014-08-17 CCC 3
2014-08-18 XXXX 2
2015-01-10 XXXX 3
2015-01-20 XXXX 4
2014-08-21 XXXX 5
2014-08-22 XXXX 2
2014-01-23 XXXX 3
2014-08-24 XXXX 2
2014-02-25 XXXX 3
2014-08-26 XXXX 2
2014-06-27 XXXX 4
2014-08-28 XXXX 1
2014-08-29 XXXX 1
2015-08-30 XXXX 2
2015-09-31 XXXX 3
The goal is to calculate the churn rate every 28 days in between 2014 and 2015 by the formula given above. So, it is going to be aggregating the data by rolling it by 28 days and calculating the churn by the formula.
Here is what I tried to aggregate the data by date range:
SELECT COUNT(distinct customer_id) AS count_ids, Group,
DATE_SUB(CAST(Date AS DATE), INTERVAL 56 DAY) AS Date_min,
DATE_SUB(CURRENT_DATE, INTERVAL 28 DAY) AS Date_max
FROM churn_agg
GROUP BY count_ids, Group, Date_min, Date_max
Hope someone will help me with aggregation and churn calculation. I want to simply deduct the aggregated count_ids to deduct it from the next aggregated count_ids which is after 28 days. So this is going to be successive deduction of the same column value (count_ids). I am not sure if I have to use rolling window or simple aggregation to find the churn.
As corrected by #jarlh, it's not 2015-09-31 but 2015-09-30
You can use this to create 28 days calendar:
create table daysby28 (i int, _Date date);
insert into daysby28 (i, _Date)
SELECT i, cast('01-01-2014'as date) + i*INTERVAL '28 day'
from generate_series(0,50) i
order by 1;
After you use #jarlh churn_agg table creation he sent with the fiddle, with this query, you get what you want:
with cte as
(
select count(Customer) as TotalCustomer, Cohort, CohortDateStart From
(
select distinct a.Customer_id as Customer, b.i as Cohort, b._Date as CohortDateStart
from churn_agg a left join daysby28 b on a._Date >= b._Date and a._Date < b._Date + INTERVAL '28 day'
) a
group by Cohort, CohortDateStart
)
select a.CohortDateStart,
1.0*(b.TotalCustomer - a.TotalCustomer)/(1.0*b.TotalCustomer) as Churn from cte a
left join cte b on a.cohort > b.cohort
and not exists(select 1 from cte c where c.cohort > b.cohort and c.cohort < a.cohort)
order by 1
The fiddle of all together is here

GROUP BY several hours

I have a table where our product records its activity log. The product starts working at 23:00 every day and usually works one or two hours. This means that once a batch started at 23:00, it finishes about 1:00am next day.
Now, I need to take statistics on how many posts are registered per batch but cannot figure out a script that would allow me achiving this. So far I have following SQL code:
SELECT COUNT(*), DATEPART(DAY,registrationtime),DATEPART(HOUR,registrationtime)
FROM RegistrationMessageLogEntry
WHERE registrationtime > '2014-09-01 20:00'
GROUP BY DATEPART(DAY, registrationtime), DATEPART(HOUR,registrationtime)
ORDER BY DATEPART(DAY, registrationtime), DATEPART(HOUR,registrationtime)
which results in following
count day hour
....
1189 9 23
8611 10 0
2754 10 23
6462 11 0
1885 11 23
I.e. I want the number for 9th 23:00 grouped with the number for 10th 00:00, 10th 23:00 with 11th 00:00 and so on. How could I do it?
You can do it very easily. Use DATEADD to add an hour to the original registrationtime. If you do so, all the registrationtimes will be moved to the same day, and you can simply group by the day part.
You could also do it in a more complicated way using CASE WHEN, but it's overkill on the view of this easy solution.
I had to do something similar a few days ago. I had fixed timespans for work shifts to group by where one of them could start on one day at 10pm and end the next morning at 6am.
What I did was:
Define a "shift date", which was simply the day with zero timestamp when the shift started for every entry in the table. I was able to do so by checking whether the timestamp of the entry was between 0am and 6am. In that case I took only the date of this DATEADD(dd, -1, entryDate), which returned the previous day for all entries between 0am and 6am.
I also added an ID for the shift. 0 for the first one (6am to 2pm), 1 for the second one (2pm to 10pm) and 3 for the last one (10pm to 6am).
I was then able to group over the shift date and shift IDs.
Example:
Consider the following source entries:
Timestamp SomeData
=============================
2014-09-01 06:01:00 5
2014-09-01 14:01:00 6
2014-09-02 02:00:00 7
Step one extended the table as follows:
Timestamp SomeData ShiftDay
====================================================
2014-09-01 06:01:00 5 2014-09-01 00:00:00
2014-09-01 14:01:00 6 2014-09-01 00:00:00
2014-09-02 02:00:00 7 2014-09-01 00:00:00
Step two extended the table as follows:
Timestamp SomeData ShiftDay ShiftID
==============================================================
2014-09-01 06:01:00 5 2014-09-01 00:00:00 0
2014-09-01 14:01:00 6 2014-09-01 00:00:00 1
2014-09-02 02:00:00 7 2014-09-01 00:00:00 2
If you add one hour to registrationtime, you will be able to group by the date part:
GROUP BY
CAST(DATEADD(HOUR, 1, registrationtime) AS date)
If the starting hour must be reflected accurately in the output (as 9, 23, 10, 23 rather than as 10, 0, 11, 0), you could obtain it as MIN(registrationtime) in the SELECT clause:
SELECT
count = COUNT(*),
day = DATEPART(DAY, MIN(registrationtime)),
hour = DATEPART(HOUR, MIN(registrationtime))
Finally, in case you are not aware, you can reference columns by their aliases in ORDER BY:
ORDER BY
day,
hour
just so that you do not have to repeat the expressions.
The below query will give you what you are expecting..
;WITH CTE AS
(
SELECT COUNT(*) Count, DATEPART(DAY,registrationtime) Day,DATEPART(HOUR,registrationtime) Hour,
RANK() over (partition by DATEPART(HOUR,registrationtime) order by DATEPART(DAY,registrationtime),DATEPART(HOUR,registrationtime)) Batch_ID
FROM RegistrationMessageLogEntry
WHERE registrationtime > '2014-09-01 20:00'
GROUP BY DATEPART(DAY, registrationtime), DATEPART(HOUR,registrationtime)
)
SELECT SUM(COUNT) Count,Batch_ID
FROM CTE
GROUP BY Batch_ID
ORDER BY Batch_ID
You can write a CASE statement as below
CASE WHEN DATEPART(HOUR,registrationtime) = 23
THEN DATEPART(DAY,registrationtime)+1
END,
CASE WHEN DATEPART(HOUR,registrationtime) = 23
THEN 0
END

Postgres sql 8.4 Use of Time Difference and Date Difference Separately

I Have this question and i can not resolve it because i think thats is impossible in sql
I have this table
Shedule
id_emp Name `time initial` time end
1 juan` 09:00` 12:00
2 Francisco 10:00 11:30
3 Sebastian 11:00 15:00
6 Roberto 15:00 18:00
Suspension
id_emp suspension_initial suspension_end
1 2013-06-01 2013-06-01
2 2013-06-01 2013-06-03
3 2013-06-03 2013-06-04
6 2013-06-01 2013-06-01
2 2013-07-01 2013-07-01
3 2013-07-05 2013-07-05
1 2013-07-06 2013-07-06
I want to catch hours worked ((time_end- time_initial) - suspension) (if i have 1 day of suspension is one unit of hour example: Juan Worked 3 hours per day and he has 1 day of suspension on june and one day of suspension on july. So i assume in one month he works 3*20 (Hours*dayworked) - 3 hours june and 3 hours july
How can i get this result
id_emp name ` June-2013 July-2013
1 Juan 57 (hours Worked) 57 (hours Worked)
2 Francisco 24 (hours worked) 27 (hours worked)
3 Sebastián
6 Roberto
Here is the SQLFiddel Demo
Below is the query which you can try
select EmpHrs.ID_EMP,
EmpHrs.Name,
(
(EmpHrs.NOOFHRS*20)-
(EmpHrs.NOOFHRS*
JuneSuspension.MONTHSUSPENSION)
) as "June-2013",
(
(EmpHrs.NOOFHRS*20)-
(EmpHrs.NOOFHRS*
JulySuspension.MONTHSUSPENSION)
) as "July-2013"
from
(
select ID_EMP,NAME,
Extract(Hours from time_end-time_initial)+
Extract(Minutes from time_end-time_initial)/60 as NoOfHrs
from schedule
) EmpHrs
Left join
(select ID_EMP,to_char(to_timestamp (Extract(Month from suspension_initial)::text, 'MM'),'Mon') as MonthIni,(suspension_end::date - suspension_initial::date)+1 MonthSuspension
from suspension
where Extract(Month from suspension_initial) = 6) JuneSuspension
On JuneSuspension.ID_EMP = EmpHrs.ID_EMP
Left join
(select ID_EMP,to_char(to_timestamp (Extract(Month from suspension_initial)::text, 'MM'),'Mon') as MonthIni,(suspension_end::date - suspension_initial::date)+1 MonthSuspension
from suspension
where Extract(Month from suspension_initial) = 7) JulySuspension
On JulySuspension.ID_EMP = EmpHrs.ID_EMP