Averages are too high when getting data from over a month

Averages are too high when getting data from over a month - sql

I was asked to alter a query to work with data from a given date selection instead of just the current month. The query should get the average sales per hour during that date range. It appears to work just fine when selecting one month of data, but when I try go to over a month, the averages appear to be higher than they ought to.
I think the problem may have to do with grouping by the day, since the day would be doubled up when data is over a month, but how would I go about fixing it? Thanks in advance.
DECLARE #Start DATETIME
DECLARE #End DATETIME
SET #Start = '6/15/2015'
SET #End = '8/15/2015'
SELECT TheHour, AVG(TheCount) AS SalesPerHour
FROM
(SELECT DATEPART(DAY, DateTimeCreated) AS TheDay,
DATEPART(HOUR, DateTimeCreated) AS TheHour,
COUNT(*) AS TheCount
FROM OrderHeader
WHERE Deleted = 0
AND OrderType = 1
AND BranchID = 4
AND BackOrderedFromID IS NULL
AND DateTimeCreated >= #Start
AND DateTimeCreated < #End
GROUP BY DATEPART(DAY, DateTimeCreated), DATEPART(HOUR, DateTimeCreated)) AS T
GROUP BY TheHour
ORDER BY TheHour
SAMPLE DATA for 6/15/2015 to 7/15/2015
TheHour SalesPerHour
5 2
6 5
7 6
8 5
9 4
10 4
11 2
12 2
13 3
14 2
15 2
16 1
SAMPLE DATA for 7/15/2015 to 8/15/2015
TheHour SalesPerHour
5 1
6 7
7 6
8 5
9 4
10 4
11 4
12 2
13 4
14 2
15 1
SAMPLE DATA for 6/15/2015 to 8/15/2015 (most values are too high?)
TheHour SalesPerHour
5 2
6 10
7 11
8 8
9 7
10 6
11 5
12 3
13 5
14 4
15 2
16 1

Don't use datepart(day). This gives the day of the month. When your time frame spans multiple months, datepart(day) returns the same value for different days (for instance, "1" on the first of any month).
Instead, simply cast the value to a date to remove the time component. The rest of the query remains the same:
SELECT TheHour, AVG(TheCount) AS SalesPerHour
FROM (SELECT CAST(DateTimeCreated as Date) AS TheDay,
DATEPART(HOUR, DateTimeCreated) AS TheHour,
COUNT(*) AS TheCount
FROM OrderHeader
WHERE Deleted = 0 AND OrderType = 1 AND BranchID = 4 AND
BackOrderedFromID IS NULL AND
DateTimeCreated >= #Start
DateTimeCreated < #End
GROUP BY CAST(DateTimeCreated as Date), DATEPART(HOUR, DateTimeCreated)
) dh
GROUP BY TheHour
ORDER BY TheHour;
Alternatively, you can do this without the double aggregation:
SELECT DATEPART(HOUR, DateTimeCreated) as TheHour,
(COUNT(*) * 1.0 /
COUNT(DISTINCT CAST(DateTimeCreated as Date))
) as SalesPerHour
FROM OrderHeader oh
WHERE Deleted = 0 AND OrderType = 1 AND BranchID = 4 AND
BackOrderedFromID IS NULL AND
DateTimeCreated >= #Start
DateTimeCreated < #End
GROUP BY DATEPART(HOUR, DateTimeCreated);
Also, note that AVG() of an integer value does an integer average. So, the average of 1 and 2 is 1 in SQL Server, not 1.5. In this version the query multiplies the count by 1.0 to get decimal places -- that may or may not be desirable.

To round a datetime down to it's nearest whole hour, use DATEADD and DATEDIFF together:
DECLARE #Start DATETIME
DECLARE #End DATETIME
SET #Start = '6/15/2015'
SET #End = '8/15/2015'
SELECT DATEPART(hour,RoundedHour) as Hour, AVG(TheCount) AS SalesPerHour
FROM
(SELECT DATEADD(hour,DATEDIFF(hour,0,DateTimeCreated),0) as RoundedHour,
COUNT(*) AS TheCount
FROM OrderHeader
WHERE Deleted = 0
AND OrderType = 1
AND BranchID = 4
AND BackOrderedFromID IS NULL
AND DateTimeCreated >= #Start
AND DateTimeCreated < #End
GROUP BY DATEADD(hour,DATEDIFF(hour,0,DateTimeCreated),0)) AS T
GROUP BY DATEPART(hour,RoundedHour)
ORDER BY DATEPART(hour,RoundedHour)
That way you don't have to think about all of the larger components (day, month, year) that you'd also want to group by, for larger ranges.

Since your query is using DAY as the datepart, you're effectively adding the number of sales in each hour on each day before getting the averages. For example, if a salesperson has 10 sales in the 5pm hour on Jan. 1st and 12 sales on Feb. 1st in the 5pm hour then you're going to get an intermediate value of 22 sales for "day 1". You end up averaging these over the days of each individual month, but then not over the days themselves.
You could use the DATEPART of DY (day of year) instead, but then your query would experience the same issue if you started to span years. Instead, just CAST the DATETIME as a DATE to get rid of the time portion, or even better yet, use a windowed function to get your numbers, like so:
;WITH CTE_HourBreakdown AS
(
SELECT
DATEPART(HOUR, DateTimeCreated) AS hr,
COUNT(*) OVER (PARTITION BY (YEAR(DateTimeCreated), DATEPART(DY, DateTimeCreated), DATEPART(HOUR, DateTimeCreated)) AS cnt
FROM
OrderHeader
)
SELECT
hr,
AVG(CAST(cnt AS DECIMAL(10, 2)))
FROM
CTE_HourBreakdown
GROUP BY
hr
There's likely a better way to do this with windowed functions, but this was the first thing that came to me. Also, note that if there are no sales in an hour this method does NOT average that into the results. For example, if on one day between 4pm and 5pm there are no sales and the next day there are 2 sales this will show an average of 2 sales between 4pm and 5pm instead of 1 sale on average. If you want to account for that then you'll need a method to distinguish zero-sale hours from hours when no one is working.

Related

Extract the record for last hour for specific date

I am trying to extract the last hour (TKT_DT) record for number of tickets (TKT_DN) from sales table (PS_TKT_HIST) for specific date (BUS_DAT).
I have the following code but it extracts the number of tickets (TKT_NO) for each hour. I want to filter the last hour only. Here is the code I used:
Select count(TKT_NO) AS SAL_TKTS,
DATEPART(HOUR, (TKT_DT))AS SAL_HR
FROM PS_TKT_HIST
WHERE BUS_DAT = '2022-03-30'
GROUP By DATEPART (HOUR, TKT_DT)
I get the flowing results
SAL_TKTS SAL_HR
5 10
1 11
3 12
5 13
10 14
13 15
23 16
18 17
12 18
6 19
6 20
4 21
I want to get only the record (4) for the last hour (21)

If you just want the number of tickets in the last hour on a given day:
DECLARE #date date = '20220330';
SELECT COUNT(*)
FROM dbo.PS_TKT_HIST
WHERE BUS_DAT = #date
AND TKT_DAT >= DATEADD(HOUR, 23, CONVERT(datetime2, #date));
For any hour other than the last hour (let's say, the 9PM hour):
WHERE BUS_DAT = #date
AND TKT_DAT >= DATEADD(HOUR, 21, CONVERT(datetime2, #date))
AND TKT_DAT < DATEADD(HOUR, 22, CONVERT(datetime2, #date));
If by "last hour" you don't mean 11 PM but rather the last hour there was a sale, you would have to do something like this:
DECLARE #date date = '20220330';
SELECT TOP (1) COUNT(*)
FROM dbo.PS_TKT_HIST
WHERE BUS_DAT = #date
GROUP BY DATEPART(HOUR, TKT_DAT)
ORDER BY DATEPART(HOUR, TKT_DAT) DESC;

Calculate Experience without overlapping

I'm trying to come up with the correct query to calculate the employment experience time but, I can't get it right. Here's the data I have:
Case 1:
EmployeeID PoisitionID StartDate EndDate
1 15 5/22/2017 5/22/2018
1 17 7/14/2018 8/10/2019
Case 2:
EmployeeID PositonID StartDate EndDate
1 15 5/22/2017 8/10/2019
1 17 3/8/2019 8/10/2019
Case 3:
EmployeeID PositonID StartDate EndDate
1 15 5/22/2017 NULL
1 17 3/8/2019 NULL
In the first case, my expected result in months would be: 27 months for both positions.
In the second case, my expected result in months would be:27 months for positonid 15 and 0 months for positionid 17 because positionid 17 falls during the date range of the first position and therefore, the employee will not be awarded with any years of experience.
In the third case, my expected result in months would be:30 months using today's date as an enddate for positonid 15 and 0 months for positionid 17 because positionid 17 falls during the date range of the first position and therefore, the employee will not be awarded with any years of experience.

You don't have any gaps, so I think this does what you want:
select employeeid,
datediff(month, min(startdate), coalesce(max(enddate), getdate())) as months
from t
group by employeeid;

This is what I have:
Your table 1:
select 1 as EmployeeID , 15 as PositonID , cast('5/22/2017' as date) as StartDate, cast('5/22/2018' as date) as EndDate into t2
union select 1, 17, '7/14/2018', '8/10/2019'
And the query to get the result
with a as
(
select EmployeeID, isnull(StartDate, cast(getdate() as date)) as sedate from t2
union
select EmployeeID, isnull(EndDate, cast(getdate() as date)) from t2
)
select a1.*, a2.sedate, case when datediff(month,a1.sedate, a2.sedate)< 0 then 0 else isnull(datediff(month,a1.sedate, a2.sedate), 0) end as months from a a1 left join a a2 on a1.EmployeeID = a2.EmployeeID and a1.sedate < a2.sedate
and not exists(select 1 from a a3 where a3.EmployeeID = a2.EmployeeID and a3.sedate > a1.sedate and a3.sedate < a2.sedate )
I changed the table to the values of Case2 and Case 3 and it seemed to work.
Let us know if that helps

sql command to find average count of user visits to a website from past 6 months

I have a table with 2 columns, Date and number of visits.
i need to calculate average count difference of visits by month from past 6 months
Date Number_of_Visits
2018-04-06 5
2018-02-06 6
2017-04-10 3
2017-02-10 9
SQL should output
Avg_count difference visits past 6 months
5-3=2
6-9=-3
-3+2/2=-0.5
sql query output should be -0.5
creating sql as below
With cte as (
SELECT Year(v1.date) as Year, Month(v1.date) as Month, sum(v1.visits) as SumCount
FROM visits_table v1
group by Year(v1.date), Month(v1.date)
)

You wanted the average of the different of the same month over the years ? Year on Year comparison ?
This will gives you the result that you want -0.5
; With
cte as
(
SELECT Year(v1.date) as Year, Month(v1.date) as Month, sum(v1.visits) as SumCount
FROM visits_table v1
WHERE v1.date >= DATEADD(MONTH, -6, GETDATE()) -- Add here
group by Year(v1.date), Month(v1.date)
)
SELECT AVG (diff * 1.0)
FROM
(
SELECT *, diff = SumCount
- LAG (SumCount) OVER (PARTITION BY Month
ORDER BY Year)
FROM cte
) d

SQL - How to count records for each status in one line per day?

I have a table Sales
Sales
--------
id
FormUpdated
TrackingStatus
There are several status e.g. Complete, Incomplete, SaveforLater, ViewRates etc.
I want to have my results in this form for the last 8 days(including today).
Expected Result:
Date Part of FormUpdated, Day of Week, Counts of ViewRates, Counts of Sales(complete), Counts of SaveForLater
--------------------------------------
2015-05-19 Tuesday 3 1 21
2015-05-18 Monday 12 5 10
2015-05-17 Sunday 6 1 8
2015-05-16 Saturday 5 3 7
2015-05-15 Friday 67 5 32
2015-05-14 Thursday 17 0 5
2015-05-13 Wednesday 22 0 9
2015-05-12 Tuesday 19 2 6
Here is my sql query:
select datename(dw, FormUpdated), count(ID), TrackingStatus
from Sales
where FormUpdated <= GETDATE()
AND FormUpdated >= GetDate() - 8
group by datename(dw, FormUpdated), TrackingStatus
order by datename(dw, FormUpdated) desc
I do not know how to make the next step.
Update
I forgot to mention, I only need the Date part of the FormUpdated, not all parts.

You can use SUM(CASE WHEN TrackingStatus = 'SomeTrackingStatus' THEN 1 ELSE 0 END)) to get the status count for each tracking status in individual column. Something like this. SQL Fiddle
select
CONVERT(DATE,FormUpdated) FormUpdated,
DATENAME(dw, CONVERT(DATE,FormUpdated)),
SUM(CASE WHEN TrackingStatus = 'ViewRates' THEN 1 ELSE 0 END) c_ViewRates,
SUM(CASE WHEN TrackingStatus = 'Complete' THEN 1 ELSE 0 END) c_Complete,
SUM(CASE WHEN TrackingStatus = 'SaveforLater' THEN 1 ELSE 0 END) c_SaveforLater
from Sales
where FormUpdated <= GETDATE()
AND FormUpdated >= DATEADD(D,-8,GetDate())
group by CONVERT(DATE,FormUpdated)
order by CONVERT(DATE,FormUpdated) desc

You can also use a PIVOT to achieve this result - you'll just need to complete the list of TrackingStatus names in both the SELECT and the FOR, and no GROUP BY required:
WITH DatesOnly AS
(
SELECT Id, CAST(FormUpdated AS DATE) AS DateOnly, DATENAME(dw, FormUpdated) AS DayOfWeek, TrackingStatus
FROM Sales
)
SELECT DateOnly, DayOfWeek,
-- List of Pivoted Columns
[Complete],[Incomplete], [ViewRates], [SaveforLater]
FROM DatesOnly
PIVOT
(
COUNT(Id)
-- List of Pivoted columns
FOR TrackingStatus IN([Complete],[Incomplete], [ViewRates], [SaveforLater])
) pvt
WHERE DateOnly <= GETDATE() AND DateOnly >= GetDate() - 8
ORDER BY DateOnly DESC
SqlFiddle
Also, I think your ORDER BY is wrong - it should just be the Date, not day of week.

Totals over rolling timeframe

I have my data arranged like this:
obj_id quantity date
1 3 2014-05-06
2 2 2014-03-12
3 5 2014-10-07
4 7 2014-05-09
2 8 2014-12-31
1 5 2014-01-16
4 1 2014-07-26
3 2 2014-09-15
...
What I need is to find the OBJ_ID's that have the SUM(quantity) > MAX over the period of RANGE days.
In my case MAX is 18 and RANGE is 31 days.
In other words, every given OBJ_ID recieves QUANTITY (no matter of what) from time to time. I need to find OBJ_IDs that had received in total more than 18 and dates that this OBJ_ID recieved Qs span over less than 31 days. Doh.)
I think I need to use LAG here, but not sure how the whole thing should be.
Thanks in advance.

This might need some tweaking as I didn't have the time to decently test it, but maybe it'll get you on the right track:
(I've assumed you want the records where the date is within the last 31 days)
SELECT SUM(quantity)
FROM tblTable
WHERE date between DATEADD(day, -RANGE, GETDATE()) and GETDATE()
HAVING SUM(quantity) > MAX
GROUP BY obj_id

I'm currently testing a solution a colleague of mine has quickly put together:
SELECT A.*
FROM (
SELECT A.obj_id
, A.date
, A.in_month_date
, A.date - A.in_month_date AS in_month
, A.quantity
, A.in_month_quantity
FROM (
SELECT A.obj_id
, A.date
, FIRST_VALUE(A.date)
OVER (
PARTITION BY A.obj_id
ORDER BY A.date
RANGE BETWEEN 31 PRECEDING
AND CURRENT ROW
) AS in_month_date
, A.quantity
, SUM(A.quantity)
OVER (
PARTITION BY A.obj_id
ORDER BY A.date
RANGE BETWEEN 31 PRECEDING
AND CURRENT ROW
) AS in_month_quantity
FROM mytable A
) A
) A
WHERE A.in_month <= 31
AND A.in_month_quantity > 18

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Averages are too high when getting data from over a month - sql

Related

Extract the record for last hour for specific date

Calculate Experience without overlapping

sql command to find average count of user visits to a website from past 6 months

SQL - How to count records for each status in one line per day?

Totals over rolling timeframe

Categories

Resources