How to select the Average per WORK Day/Week/Month/60 Days/90 Days when not all days are in table? - sql

OK, So after a lot of research I am still stuck. I get the simple basics:
I can select average of all values using a group by statement. grouping where the dates are within my range and by the average I want.
Select AVG(Qty)
From (
Select Sum(ItemQty) As Qty
From ReceivedInfo
Where DateUsed >= '2014-04-01'
AND DateUsed < '2014-04-30'
Group By Date(DateUsed)
)
the Issue here is that I need to include days that aren't in the table too. All Workdays should be accounted for even days that don't have an entry. I am not sure how to get an average of Qty per day (or month, or 60 days and so on) including all, and only, workdays from a data range. Exclude Weekends.
Would it be enough to do something like:
Select Sum(QTy) / (julianday('2014-04-30') - julianday('2014-04-01' )) - (((julianday('2014-04-30') - julianday('2014-04-01' )) / 7 ) * 2) As Average
From (
Select Sum(ItemQty) As Qty
From ReceivedInfo
Where DateUsed >= '2014-04-01'
AND DateUsed < '2014-04-30'
)
Or Do I have to determine the number of work days outside of the query?

A while back I saw someone do something rather interesting with a CTE that solved a similar problem: perhaps this would work for you?
WITH DateRange(Date) AS
(
SELECT '2014-04-01' Date
UNION ALL
SELECT DATEADD(day, 1, Date) Date
FROM DateRange
WHERE Date < '2014-04-30'
)
SELECT CONVERT(VARCHAR(10),Date, 121) -- either SELECT from or JOIN to this table
FROM DateRange
WHERE DatePart(dw, Date) NOT IN (1,7) -- this could limit it to Mon-Fri only
EDIT: I think that the following would be the linkage for you to include this in your resultset:
Select AVG(Qty)
From (
Select Sum(ItemQty) As Qty
From
DateRange
LEFT JOIN
ReceivedInfo ON
DateRange.Date = ReceivedInfo.DateUsed
Where DateRange.Date >= '2014-04-01'
AND DateRange.Date < '2014-04-30'
Group By DateRange.Date
)
EDIT 2: removed variables as per commenter below. Thanks for the reminder!

Your final query should work fine if you want the missing values to be treated as 0. You can, however, simplify it. So, to a close approximation:
Select Sum(QTy) / (5.0/7 * (julianday('2014-04-30') - julianday('2014-04-01' ) ) )
From ReceivedInfo
Where DateUsed >= '2014-04-01' AND DateUsed < '2014-04-30';
Of course, this is just an approximation of the number of work days (which is similar to your proposed solution). A full solution would use cumbersome day-of-week arithmetic.

Related

How to count records for each day in a range (including days without records)

I'm trying to refine this question a little since I didn't really ask correctly last time. I am essentially doing this query:
Select count(orders)
From Orders_Table
Where Order_Open_Date<=To_Date('##/##/####','MM/DD/YYYY')
and Order_Close_Date>=To_Date('##/##/####','MM/DD/YYYY')
Where ##/##/#### is the same day. In essence this query is designed to find the number of 'open' orders on any given day. The only problem is I'm wanting to do this for each day of a year or more. I think if I knew how to define the ##/##/#### as a variable and then grouped the count by that variable then I could get this to work but I'm not sure how to do that-or there may be another way as well. I am currently using Oracle SQL on SQL developer. Thanks for any input.
You could use a "row generator" technique like this (edited for Hogan's comments):
Select RG.Day,
count(orders)
From Orders_Table,
(SELECT trunc(SYSDATE) - ROWNUM as Day
FROM (SELECT 1 dummy FROM dual)
CONNECT BY LEVEL <= 365
) RG
Where RG.Day <=To_Date('##/##/####','MM/DD/YYYY')
and RG.Day >=To_Date('##/##/####','MM/DD/YYYY')
and Order_Open_Date(+) <= RG.Day
and Order_Close_Date(+) >= RG.Day - 1
Group by RG.Day
Order by RG.Day
This should list each day of the previous year with the corresponding number of orders
Lets say you had a table datelist with a column adate
aDate
1/1/2012
1/2/2012
1/3/2012
Now you join that to your table
Select *
From Orders_Table
join datelist on Order_Open_Date<=adate and Order_Close_Date>=adate
This gives you a list of all the orders you care about, now you group by and count
Select aDate, count(*)
From Orders_Table
join datelist on Order_Open_Date<=adate and Order_Close_Date>=adate
group by adate
If you want to pass in a parameters then just generate the dates with a recursive cte
with datelist as
(
select #startdate as adate
UNION ALL
select adate + 1
from datelist
where (adate + 1) <= #lastdate
)
Select aDate, count(*)
From Orders_Table
join datelist on Order_Open_Date<=adate and Order_Close_Date>=adate
group by adate
NOTE: I don't have an Oracle DB to test on so I might have some syntax wrong for this platform, but you get the idea.
NOTE2: If you want all dates listed with 0 for those that have nothing use this as your select statement:
Select aDate, count(Order_Open_Date)
From Orders_Table
left join datelist on Order_Open_Date<=adate and Order_Close_Date>=adate
group by adate
If you want only one day you can query using TRUNC like this
select count(orders)
From orders_table
where trunc(order_open_date) = to_date('14/05/2012','dd/mm/yyyy')

Add one for every row that fulfills where criteria between period

I have a Postgres table that I'm trying to analyze based on some date columns.
I'm basically trying to count the number of rows in my table that fulfill this requirement, and then group them by month and year. Instead of my query looking like this:
SELECT * FROM $TABLE WHERE date1::date <= '2012-05-31'
and date2::date > '2012-05-31';
it should be able to display this for the months available in my data so that I don't have to change the months manually every time I add new data, and so I can get everything with one query.
In the case above I'd like it to group the sum of rows which fit the criteria into the year 2012 and month 05. Similarly, if my WHERE clause looked like this:
date1::date <= '2012-06-31' and date2::date > '2012-06-31'
I'd like it to group this sum into the year 2012 and month 06.
This isn't entirely clear to me:
I'd like it to group the sum of rows
I'll interpret it this way: you want to list all rows "per month" matching the criteria:
WITH x AS (
SELECT date_trunc('month', min(date1)) AS start
,date_trunc('month', max(date2)) + interval '1 month' AS stop
FROM tbl
)
SELECT to_char(y.mon, 'YYYY-MM') AS mon, t.*
FROM (
SELECT generate_series(x.start, x.stop, '1 month') AS mon
FROM x
) y
LEFT JOIN tbl t ON t.date1::date <= y.mon
AND t.date2::date > y.mon -- why the explicit cast to date?
ORDER BY y.mon, t.date1, t.date2;
Assuming date2 >= date1.
Compute lower and upper border of time period and truncate to month (adding 1 to upper border to include the last row, too.
Use generate_series() to create the set of months in question
LEFT JOIN rows from your table with the declared criteria and sort by month.
You could also GROUP BY at this stage to calculate aggregates ..
Here is the reasoning. First, create a list of all possible dates. Then get the cumulative number of date1 up to a given date. Then get the cumulative number of date2 after the date and subtract the results. The following query does this using correlated subqueries (not my favorite construct, but handy in this case):
select thedate,
(select count(*) from t where date1::date <= d.thedate) -
(select count(*) from t where date2::date > d.thedate)
from (select distinct thedate
from ((select date1::date as thedate from t) union all
(select date2::date as thedate from t)
) d
) d
This is assuming that date2 occurs after date1. My model is start and stop dates of customers. If this isn't the case, the query might not work.
It sounds like you could benefit from the DATEPART T-SQL method. If I understand you correctly, you could do something like this:
SELECT DATEPART(year, date1) Year, DATEPART(month, date1) Month, SUM(value_col)
FROM $Table
-- WHERE CLAUSE ?
GROUP BY DATEPART(year, date1),
DATEPART(month, date1)

How do I get a maximium daily value of a numerical field over a year in SQL

How do I get a maximium daily value of a numerical field over a year in MS-SQL
This would query the daily maximum of value over 2008:
select
datepart(dayofyear,datecolumn)
, max(value)
from yourtable
where '2008-01-01' <= datecolumn and datecolumn < '2009-01-01'
group by datepart(dayofyear,datecolumn)
Or the daily maximum over each year:
select
datepart(year,datecolumn),
, datepart(dayofyear,datecolumn)
, max(value)
from yourtable
group by datepart(year,datecolumn), datepart(dayofyear,datecolumn)
Or the day(s) with the highest value in a year:
select
Year = datepart(year,datecolumn),
, DayOfYear = datepart(dayofyear,datecolumn)
, MaxValue = max(MaxValue)
from yourtable
inner join (
select
Year = datepart(year,datecolumn),
, MaxValue = max(value)
from yourtable
group by datepart(year,datecolumn)
) sub on
sub.Year = yourtable.datepart(year,datecolumn)
and sub.MaxValue = yourtable.value
group by
datepart(year,datecolumn),
datepart(dayofyear,datecolumn)
You didn't mention which RDBMS or SQL dialect you're using. The following will work with T-SQL (MS SQL Server). It may require some modifications for other dialects since date functions tend to change a lot between them.
SELECT
DATEPART(dy, my_date),
MAX(my_number)
FROM
My_Table
WHERE
my_date >= '2008-01-01' AND
my_date < '2009-01-01'
GROUP BY
DATEPART(dy, my_date)
The DAY function could be any function or combination of functions which gives you the days in the format that you're looking to get.
Also, if there are days with no rows at all then they will not be returned. If you need those days as well with a NULL or the highest value from the previous day then the query would need to be altered a bit.
Something like
SELECT dateadd(dd,0, datediff(dd,0,datetime)) as day, MAX(value)
FROM table GROUP BY dateadd(dd,0, datediff(dd,0,datetime)) WHERE
datetime < '2009-01-01' AND datetime > '2007-12-31'
Assuming datetime is your date column, dateadd(dd,0, datediff(dd,0,datetime)) will extract only the date part, and then you can group by that value to get a maximum daily value. There might be a prettier way to get only the date part though.
You can also use the between construct to avoid the less than and greater than.
Group on the date, use the max delegate to get the highest value for each date, sort on the value, and get the first record.
Example:
select top 1 theDate, max(theValue)
from TheTable
group by theDate
order by max(theValue) desc
(The date field needs to only contain a date for this grouping to work, i.e. the time component has to be zero.)
If you need to limit the query for a specific year, use a starting and ending date in a where claues:
select top 1 theDate, max(theValue)
from TheTable
where theDate between '2008-01-01' and '2008-12-13'
group by theDate
order by max(theValue) desc

Execute count(*) on a group-by result-set

I am trying to do a nice SQL statement inside a stored procedure.
I looked at the issue of seeing the number of days that events happened between two dates.
My example is sales orders: for this month, how many days did we have sales orders?
Suppose this setup:
CREATE TABLE `sandbox`.`orders` (
`year` int,
`month` int,
`day` int,
`desc` varchar(255)
)
INSERT INTO orders (year, month, day, desc)
VALUES (2009,1,1, 'New Years Resolution 1')
,(2009,1,1, 'Promise lose weight')
,(2009,1,2, 'Bagel')
,(2009,1,12, 'Coffee to go')
For this in-data the result should be 3, since there has been three days with sale.
The best solution I found is as below.
However, making a temporary table, counting that then dropping it seemes excess. It "should" be possible in one statement.
Anyone who got a "nicer" solution then me?
/L
SELECT [Year], [Month], [Day]
INTO #Some_Days
FROM Quarter
WHERE Start >= '2009-01-01' AND [End] < '2009-01-16'
GROUP BY [Year], [Month], [Day]
SELECT count(*) from #Some_Days
Apologies if I'm misunderstanding the question, but perhaps you could do something like this, as an option:
SELECT COUNT(*) FROM
(SELECT DISTINCT(SomeColumn)
FROM MyTable
WHERE Something BETWEEN 100 AND 500
GROUP BY SomeColumn) MyTable
... to get around the temp-table creation and disposal?
There are two basic options which I can see. One is to group everything up in a sub query, then count those distinct rows (Christian Nunciato's answer). The second is to combine the multiple fields and count distinct values of that combined value.
In this case, the following formula coverts the three fields into a single datetime.
DATEADD(YEAR, [Quarter].Year, DATEADD(MONTH, [Quarter].Month, DATEADD(DAY, [Quarter].DAY, 0), 0), 0)
Thus, COUNT(DISTINCT [formula]) will give the answer you need.
SELECT
COUNT(DISTINCT DATEADD(YEAR, [Quarter].Year, DATEADD(MONTH, [Quarter].Month, DATEADD(DAY, [Quarter].DAY, 0), 0), 0))
FROM
Quarter
WHERE
[Quarter].Start >= '2009-01-01'
AND [Quarter].End < '2009-01-16'
I usually use the sub query route, but depending on what you're doing, indexes, size of table, simplicity of the formula, etc, this Can be faster...
Dems.
How about:
SELECT COUNT(DISTINCT day) FROM orders
WHERE (year, month) = (2009, 1);
Actually, I don't know if TSQL supports tuple comparisons, but you get the idea.
COUNT(DISTINCT expr) is standard SQL and should work everywhere.
You should use nested Select statements. Inner one should contain group by clause, and the outer one should count it. I think "Christian Nunciato" helped you already.
Select Count(1) As Quantity
From
(
SELECT [Year], [Month], [Day]
INTO #Some_Days
FROM Quarter
WHERE Start >= '2009-01-01' AND [End] < '2009-01-16'
GROUP BY [Year], [Month], [Day]
) AS InnerResultSet
SELECT [Year], [Month], [Day]
FROM Quarter
WHERE Start >= '2009-01-01' AND [End] < '2009-01-16'
GROUP BY [Year], [Month], [Day]
COMPUTE COUNT(*)

SQL for counting events by date

I feel like I've seen this question asked before, but neither the SO search nor google is helping me... maybe I just don't know how to phrase the question. I need to count the number of events (in this case, logins) per day over a given time span so that I can make a graph of website usage. The query I have so far is this:
select
count(userid) as numlogins,
count(distinct userid) as numusers,
convert(varchar, entryts, 101) as date
from
usagelog
group by
convert(varchar, entryts, 101)
This does most of what I need (I get a row per date as the output containing the total number of logins and the number of unique users on that date). The problem is that if no one logs in on a given date, there will not be a row in the dataset for that date. I want it to add in rows indicating zero logins for those dates. There are two approaches I can think of for solving this, and neither strikes me as very elegant.
Add a column to the result set that lists the number of days between the start of the period and the date of the current row. When I'm building my chart output, I'll keep track of this value and if the next row is not equal to the current row plus one, insert zeros into the chart for each of the missing days.
Create a "date" table that has all the dates in the period of interest and outer join against it. Sadly, the system I'm working on already has a table for this purpose that contains a row for every date far into the future... I don't like that, and I'd prefer to avoid using it, especially since that table is intended for another module of the system and would thus introduce a dependency on what I'm developing currently.
Any better solutions or hints at better search terms for google? Thanks.
Frankly, I'd do this programmatically when building the final output. You're essentially trying to read something from the database which is not there (data for days that have no data). SQL isn't really meant for that sort of thing.
If you really want to do that, though, a "date" table seems your best option. To make it a bit nicer, you could generate it on the fly, using i.e. your DB's date functions and a derived table.
I had to do exactly the same thing recently. This is how I did it in T-SQL (
YMMV on speed, but I've found it performant enough over a coupla million rows of event data):
DECLARE #DaysTable TABLE ( [Year] INT, [Day] INT )
DECLARE #StartDate DATETIME
SET #StartDate = whatever
WHILE (#StartDate <= GETDATE())
BEGIN
INSERT INTO #DaysTable ( [Year], [Day] )
SELECT DATEPART(YEAR, #StartDate), DATEPART(DAYOFYEAR, #StartDate)
SELECT #StartDate = DATEADD(DAY, 1, #StartDate)
END
-- This gives me a table of all days since whenever
-- you could select #StartDate as the minimum date of your usage log)
SELECT days.Year, days.Day, events.NumEvents
FROM #DaysTable AS days
LEFT JOIN (
SELECT
COUNT(*) AS NumEvents
DATEPART(YEAR, LogDate) AS [Year],
DATEPART(DAYOFYEAR, LogDate) AS [Day]
FROM LogData
GROUP BY
DATEPART(YEAR, LogDate),
DATEPART(DAYOFYEAR, LogDate)
) AS events ON days.Year = events.Year AND days.Day = events.Day
Create a memory table (a table variable) where you insert your date ranges, then outer join the logins table against it. Group by your start date, then you can perform your aggregations and calculations.
The strategy I normally use is to UNION with the opposite of the query, generally a query that retrieves data for rows that don't exist.
If I wanted to get the average mark for a course, but some courses weren't taken by any students, I'd need to UNION with those not taken by anyone to display a row for every class:
SELECT AVG(mark), course FROM `marks`
UNION
SELECT NULL, course FROM courses WHERE course NOT IN
(SELECT course FROM marks)
Your query will be more complex but the same principle should apply. You may indeed need a table of dates for your second query
Option 1
You can create a temp table and insert dates with the range and do a left outer join with the usagelog
Option 2
You can programmetically insert the missing dates while evaluating the result set to produce the final output
WITH q(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
qq(n) AS
(
SELECT 0
UNION ALL
SELECT n + 1
FROM q
WHERE n < 99
),
dates AS
(
SELECT q.n * 100 + qq.n AS ndate
FROM q, qq
)
SELECT COUNT(userid) as numlogins,
COUNT(DISTINCT userid) as numusers,
CAST('2000-01-01' + ndate AS DATETIME) as date
FROM dates
LEFT JOIN
usagelog
ON entryts >= CAST('2000-01-01' AS DATETIME) + ndate
AND entryts < CAST('2000-01-01' AS DATETIME) + ndate + 1
GROUP BY
ndate
This will select up to 10,000 dates constructed on the fly, that should be enough for 30 years.
SQL Server has a limitation of 100 recursions per CTE, that's why the inner queries can return up to 100 rows each.
If you need more than 10,000, just add a third CTE qqq(n) and cross-join with it in dates.