I am trying to improve my query writing and need help with the following...
I have one table with multiple columns, including Operation_Code, Operation_Category, Downtime_In_Minutes, Downtime (as a percentage of the last 24 hours). Each line of my results set needs to SUM(Downtime_minutes) for each Operation_Code and SUM(Count of each occurrance of the Operation_Code). Stop will always be yesterday. Date functions and formatting return yesterdays date. This is not presented in the query below due to length of the code, but it works. So, each line in the results should look like:
StopDate
Operation_Code
Operation_Category
Count (# of occurrences of each Op_Code)
SUM (in minutes) of all downtime for each Operation_Code
% of Last 24 hours
Example Results:
StopDate Op_Code OP_Category Count Downtime (Minutes) % of Last 24
7/18/2021 X123 Grinder 10 720 50%
7/18/2021 A800 Cutter 12 360 25%
7/18/2021 O225 Polisher 5 60 4%
My query without attempting any aggregations is basically:
Select StopDate,
OpCode,
OpCat
From DTS
Where StopDate = yesterday
Basic question is hw do I SUM the count of occurrences and SUM the total time in minutes for each unique Operation_Code?
Thanks in advance!
Are you just looking for aggregation? Then you can use a window function to get the ratio, which I am guessing is based on the downtime:
Select StopDate, OpCode, OpCat, count(*) as cnt,
sum(Downtime_In_Minutes),
sum(Downtime_In_Minutes) * 1.0 / nullif(sum(Downtime_In_Minutes) over ()) as
From DTS
Where StopDate = yesterday;
I assume you know how to deal with "yesterday", because you say that you already have a query.
group by StopDate;
Related
I have a scenario where I need to show daily transactions and also total transaction for that month with date and other fields like type, product etc.
Once I have that, the main requirement is to get the daily percentage of total for that month, below is an example of it. 3 transaction on 1st jan and 257 for total of jan and the percentage of 1st jan is (3/257)*100, similarly 10 is for 2nd jan and the percentage is (10/257) and so on.
can anyone help me with the sql query?
Date Type Transaction Total_For_month Percentage
1/1/2017 A 3 257 1%
1/2/2017 B 10 257 4%
1/3/2017 A 5 257 2%
1/4/2017 C 8 257 3%
1/5/2017 D 12 257 5%
1/6/2017 D 17 257 7%
Use window functions:
select t.*,
sum(transaction) over (partition by to_char(date, 'YYYY-MM')) as total_for_month,
transaction / sum(transaction) over (partition by to_char(date, 'YYYY-MM')) as ratio
from t;
DATE and TYPE are Oracle keywords, I hope you are not using them literally as column names. I will use DT and TP below.
You didn't say one way or the other, but it seems like you must filter your data so that the final report is for a single month (rather than for a full year, say). If so, you could do something like this. Notice the analytic function RATIO_TO_REPORT. Note that I multiply the ratio by 100, and I use some non-standard formatting to get the result in the "percentage" format; don't worry too much if you don't understand that part from the first reading.
select dt, tp, transaction, sum(transaction) over () as total_trans_for_month,
to_char(100 * ratio_to_report(transaction) over (), '90.0L',
'nls_currency=%') as pct_of_monthly_trans
from your_table
where dt >= date '2017-01-01' and dt < add_months(date '2017-01-01', 1)
order by dt -- if needed (add more criteria as appropriate).
Notice the analytic clause: over (). We are not partitioning by anything, and we are not ordering by anything either; but since we want every row of input to generate a row in the output, we still need the analytic version of sum, and the analytic function ratio_to_report. The proper way to achieve this is to include the over clause, but leave it empty: over ().
Note also that in the where clause I did not wrap dt within trunc or to_char or any other function. If you are lucky, there is an index on that column, and writing the where conditions as I did allows that index to be used, if the Optimizer finds it should be.
The date '2017-01-01' is arbitrary (chosen to match your example); in production it should probably be a bind variable.
I have a table that I'm trying to not only get the sum of time(hours) difference between two columns but also the amount of times a time difference is above a set amount, 6 in this case.
The total I got from Getting the sum of a datediff result but can I in the same query also get count(*) where datediff => 6?
Thanks in advance for any and all help.
DateDiff used for hours will probably not be useful, as it will return 1 hour from, say 10:55 to 11:03.
So count minutes:
Select
*, DateDiff("n", [TimeStart], [TimeEnd]) / 60 As Hours
From
YourTable
Save this query and use it as source in a new query to count those entries with an hour count greater than or equal to six:
Select Count(*) As Entries
From YourQuery
Where Hours >= 6
I have a pretty huge table with columns dates, account, amount, etc. eg.
date account amount
4/1/2014 XXXXX1 80
4/1/2014 XXXXX1 20
4/2/2014 XXXXX1 840
4/3/2014 XXXXX1 120
4/1/2014 XXXXX2 130
4/3/2014 XXXXX2 300
...........
(I have 40 months' worth of daily data and multiple accounts.)
The final output I want is the average amount of each account each month. Since there may or may not be record for any account on a single day, and I have a seperate table of holidays from 2011~2014, I am summing up the amount of each account within a month and dividing it by the number of business days of that month. Notice that there is very likely to be record(s) on weekends/holidays, so I need to exclude them from calculation. Also, I want to have a record for each of the date available in the original table. eg.
date account amount
4/1/2014 XXXXX1 48 ((80+20+840+120)/22)
4/2/2014 XXXXX1 48
4/3/2014 XXXXX1 48
4/1/2014 XXXXX2 19 ((130+300)/22)
4/3/2014 XXXXX2 19
...........
(Suppose the above is the only data I have for Apr-2014.)
I am able to do this in a hacky and slow way, but as I need to join this process with other subqueries, I really need to optimize this query. My current code looks like:
<!-- language: lang-sql -->
select
date,
account,
sum(amount/days_mon) over (partition by last_day(date))
from(
select
date,
-- there are more calculation to get the account numbers,
-- so this subquery is necessary
account,
amount,
-- this is a list of month-end dates that the number of
-- business days in that month is 19. similar below.
case when last_day(date) in ('','',...,'') then 19
when last_day(date) in ('','',...,'') then 20
when last_day(date) in ('','',...,'') then 21
when last_day(date) in ('','',...,'') then 22
when last_day(date) in ('','',...,'') then 23
end as days_mon
from mytable tb
inner join lookup_businessday_list busi
on tb.date = busi.date)
So how can I perform the above purpose efficiently? Thank you!
This approach uses sub-query factoring - what other RDBMS flavours call common table expressions. The attraction here is that we can pass the output from one CTE as input to another. Find out more.
The first CTE generates a list of dates in a given month (you can extend this over any range you like).
The second CTE uses an anti-join on the first to filter out dates which are holidays and also dates which aren't weekdays. Note that Day Number varies depending according to the NLS_TERRITORY setting; in my realm the weekend is days 6 and 7 but SQL Fiddle is American so there it is 1 and 7.
with dates as ( select date '2014-04-01' + ( level - 1) as d
from dual
connect by level <= 30 )
, bdays as ( select d
, count(d) over () tot_d
from dates
left join holidays
on dates.d = holidays.hol_date
where holidays.hol_date is null
and to_number(to_char(dates.d, 'D')) between 2 and 6
)
select yt.account
, yt.txn_date
, sum(yt.amount) over (partition by yt.account, trunc(yt.txn_date,'MM'))
/tot_d as avg_amt
from your_table yt
join bdays
on bdays.d = yt.txn_date
order by yt.account
, yt.txn_date
/
I haven't rounded the average amount.
You have 40 month of data, this data should be very stable.
I will assume that you have a cold body (big and stable easily definable range of data) and hot tail (small and active part).
Next, I would like to define a minimal period. It is a data range that is a smallest interval interesting for Business.
It might be year, month, day, hour, etc. Do you expect to get questions like "what was averege for that account between 1900 and 12am yesterday?".
I will assume that the answer is DAY.
Then,
I will calculate sum(amount) and count() for every account for every DAY of cold body.
I will not create a dummy records, if particular account had no activity on some day.
and I will save day, account, total amount, count in a TABLE.
if there are modifications later to the cold body, you delete and reload affected day from that table.
For hot tail there might be multiple strategies:
Do the same as above (same process, clear to support)
always calculate on a fly
use materialized view as an averege between 1 and 2.
Cold body table totalc could also be implemented as materialized view, but if data never change - no need to rebuild it.
With this you go from (number of account) x (number of transactions per day) x (number of days) to (number of account)x(number of active days) number of records.
That should speed up all following calculations.
I'm trying to get an estimate of how many hours people worked during a set period of time. I want to show this by department and by what area they were working in. Right now I have this:
SELECT M.MemberDepartmentID,T.TaskName,
COUNT(DATEDIFF(HOUR, TT.StartTime, TT.EndTime)) 'Hours',
AVG(DATEDIFF(HOUR, TT.StartTime, TT.EndTime)) Average
FROM Member.TaskTracking TT
LEFT OUTER JOIN Member.Task T
ON TT.TaskID=T.TaskID
JOIN dbo.tblMember M
ON TT.MemberID=M.MemberID
WHERE M.FullTime=1
AND M.EmployeeSalary=1
AND (TT.StartTime >= '2013-10-01'
AND TT.EndTime < '2013-11-01')
GROUP BY M.MemberDepartmentID,T.TaskName
ORDER BY M.MemberDepartmentID,T.TaskName
I don't know how to confirm if it's correct, but some are definitely showing averages of zero even if there were hours worked. And some averages are way higher than the hours worked. For instance, here are some of my results:
MemberDepartmentID TaskName Hours Average
---------------------------------------------------
1 Packing 25 0
1 Picking 6 0
1 PreScanning 38 7
4 Picking 2 104
Suggestions?
First, it is important to note that DATEDIFF(HOUR) returns an integer, and it does not necessarily give a good reflection of how much time has actually passed. For example, these both yield 1:
SELECT DATEDIFF(HOUR, '03:59', '04:01'); -- 2 minutes (0.033333 hours)
SELECT DATEDIFF(HOUR, '03:01', '04:59'); -- 118 minutes (1.966666 hours)
And these both yield 0:
SELECT DATEDIFF(HOUR, '03:01', '03:59'); -- 58 minutes (0.966666 hours)
SELECT DATEDIFF(HOUR, '03:01', '03:02'); -- 1 minute (0.016666 hours)
Next, if you give SQL Server integers to divide, it's going to perform integer math. Meaning it will divide, but it will discard any remainder. This yields 0:
SELECT 3/4;
Even though really it's 0.75, and if it rounded up it should be 1. (Not that either of those results are particularly meaningful). Now, extend that to average.
DECLARE #d1 TABLE(a INT);
INSERT #d1 VALUES(3),(4);
SELECT AVG(a) FROM #d1;
This yields 3, not 3.5, which you would probably expect. For the same reasons as above.
Remembering that some of your tasks may have lasted up to 59 minutes, but would still yield an hour differential of 0, you could have, say, 4 tasks, three that lasted > 1 hour, and one that lasted < 1 hour. So your average calculation would essentially be:
SELECT (1+1+1+0)/4;
Which, as above, still yields 0.
If you want a meaningful average there, you should calculate the time spent more granularly than by hours. For example, you could perform the datediff in minutes:
SELECT DATEDIFF(MINUTE, '03:01', '04:59');
This yields 118. If you want to express that in hours, you could divide by 60.0 (the decimal is important) or multiply by 1.0:
SELECT DATEDIFF(MINUTE, '03:01', '04:59')/60.0;
SELECT 1.0*DATEDIFF(MINUTE, '03:01', '04:59')/60;
These both yield 1.966666. Much more meaningful to average such a result. So perhaps change your expression to:
Average = AVG(1.0*DATEDIFF(MINUTE, TT.StartTime, TT.EndTime)/60)
About the count, not sure what you're attempting to do there, but you may want to make similar adjustments to the calculation and probably consider using SUM. If you show some sample data and the results you expect, we can help more.
Also I recommend not escaping keyword aliases using 'single quotes' - some forms of this syntax are deprecated, and it makes your alias look like a string literal. First, try not to use keywords or otherwise invalid identifiers as aliases; but if you must, escape them with [square brackets].
How can i calculate average of each 30 second? The following is the table structure
Price TTime
every minute 5-60 records inserted. The time is inserted by getDate(). I have to calculate average of every 30 seconds.
You need to do 2 things:
Create a column (in your SELECT result, not in the table) that contains the time in half-minutes;
calculate the average of Price using AVG(Price) and GROUP BY:
SELECT <function returning half minutes from TTime> AS HalfMinute, AVG(Price) FROM <Table> GROUP BY HalfMinute`
I don't know SQL Server's time functions. If you can get the time returned in seconds, you could go with SECONDS/30. Maybe someone else can step in with details here.
Something like:
SELECT
AVG(Price) AS AvgPrice,
COUNT(Price) AS CountPrice,
MIN(TTIME) AS PeriodBegin,
(SECOND(TTime) % 30) * 30 AS PeriodType /* either 0 or 30 */
FROM
PriceTable
GROUP BY
YEAR(TTime), MONTH(TTime), DAY(TTime), HOUR(TTime), MINUTE(TTime)
SECOND(TTime) % 30 /* either 0 or 1 */
ORDER BY
MIN(TTime)
In place of:
GROUP BY
YEAR(TTime), MONTH(TTime), DAY(TTime), HOUR(TTime), MINUTE(TTime)
you could also use, for example:
GROUP BY
LEFT(CONVERT(varchar, TTime, 120), 16)
In any case these are operation that invoke a table scan, since they are not indexable. A WHERE clause to determine the valid TTime range is advisable.
You could also make a column that contains the calculated date ('…:00.000' or '…:30.000') and fill that on INSERT with help of a trigger. Place an index on it, GROUP BY it, done.