SQL GROUP BY CLAUSE - sql

I have a table where I'm trying to pull some trend analysis from where the columns are Date(timestamp),RiskCategory(text) and Point(int). What I'm trying to return is for every date SUM the points and group by RiskCategory. I can get the latest via the following:
SELECT Date,RiskCategory,SUM(Point) AS Total
FROM risktrend WHERE DATE(Date) >= (SELECT MAX(DATE(Date)) FROM risktrend)
GROUP BY RiskCategory;
but am struggling with the returning the same for ALL dates. Am using MySQL.
I should further elaborate, that any date can have multiple entries, but the RiskCategory can only be Administrative,Availability, or Capacity. So, for every date I should see a SUM of points for the latter three. For example,
2010-10-06 Capacity 508
2010-10-06 Administrative 113
2010-10-06 Availability 243
2010-10-07 Capacity 493
2010-10-07 Administrative 257
2010-10-07 Availability 324

You need to add the date to your group by clause:
SELECT Date,
RiskCategory,
SUM(Point) AS Total
FROM risktrend
WHERE DATE(Date) >= (SELECT MAX(DATE(Date)) FROM risktrend)
GROUP BY Date, RiskCategory;

Related

How to segregate an monthly data from daily data using sql or bigquery?

We are receiving data monthly for some ids and daily for some ids. I need to segregate the data as monthly or daily before applying the logic required for further analysis.
i have tried to use datadiff sql function to do that, but its not really helpful in this case. Is there any way to segregate the daily receiving data from monthly using sql or big query?
Id date
55 11-02-2022 00:00
66 15-05-2022 00:00
77 13-08-2022 00:00
66 15-07-2022 00:00
77 12-08-2022 00:00
55 12-02-2022 00:00
66 15-06-2022 00:00
A count aggregation per id per month is an efficient way to separate the two groups. The question is whether a single ID consistently updates at the same cadence, and if it does not, do you want to treat it differently per time period or always daily if ever daily, etc.
Here's the basic logic to categorize the two groups by month
select
id
, date_trunc(date, MONTH) as mo
, count(*) > 1 as is_daily
from tbl
group by 1,2;
That gives you a per-id, per-month categorization.
Here is the same logic taken a step further to categorize any ID that ever receives daily updates as daily.
with bins as (
select
id
, date_trunc(date, MONTH) as mo
, count(*) > 1 as is_daily
from tbl
group by 1,2
)
select
id
, sum(case when is_daily then 1 else 0 end) > 0 as is_daily
from bins
group by 1;

Oracle query for the attached output

I have a scenario where I need to show daily transactions and also total transaction for that month with date and other fields like type, product etc.
Once I have that, the main requirement is to get the daily percentage of total for that month, below is an example of it. 3 transaction on 1st jan and 257 for total of jan and the percentage of 1st jan is (3/257)*100, similarly 10 is for 2nd jan and the percentage is (10/257) and so on.
can anyone help me with the sql query?
Date Type Transaction Total_For_month Percentage
1/1/2017 A 3 257 1%
1/2/2017 B 10 257 4%
1/3/2017 A 5 257 2%
1/4/2017 C 8 257 3%
1/5/2017 D 12 257 5%
1/6/2017 D 17 257 7%
Use window functions:
select t.*,
sum(transaction) over (partition by to_char(date, 'YYYY-MM')) as total_for_month,
transaction / sum(transaction) over (partition by to_char(date, 'YYYY-MM')) as ratio
from t;
DATE and TYPE are Oracle keywords, I hope you are not using them literally as column names. I will use DT and TP below.
You didn't say one way or the other, but it seems like you must filter your data so that the final report is for a single month (rather than for a full year, say). If so, you could do something like this. Notice the analytic function RATIO_TO_REPORT. Note that I multiply the ratio by 100, and I use some non-standard formatting to get the result in the "percentage" format; don't worry too much if you don't understand that part from the first reading.
select dt, tp, transaction, sum(transaction) over () as total_trans_for_month,
to_char(100 * ratio_to_report(transaction) over (), '90.0L',
'nls_currency=%') as pct_of_monthly_trans
from your_table
where dt >= date '2017-01-01' and dt < add_months(date '2017-01-01', 1)
order by dt -- if needed (add more criteria as appropriate).
Notice the analytic clause: over (). We are not partitioning by anything, and we are not ordering by anything either; but since we want every row of input to generate a row in the output, we still need the analytic version of sum, and the analytic function ratio_to_report. The proper way to achieve this is to include the over clause, but leave it empty: over ().
Note also that in the where clause I did not wrap dt within trunc or to_char or any other function. If you are lucky, there is an index on that column, and writing the where conditions as I did allows that index to be used, if the Optimizer finds it should be.
The date '2017-01-01' is arbitrary (chosen to match your example); in production it should probably be a bind variable.

Group By with Case statement?

I need find the number Sum of orders over a 3 day range. so imagine a table like this
Order Date
300 1/5/2015
200 1/6/2015
150 1/7/2015
250 1/5/2015
400 1/4/2015
350 1/3/2015
50 1/2/2015
100 1/8/2015
So I want to create a Group by Clause that Groups anything with a date that has the same Month, Year and a Day from 1-3 or 4-6, 7-9 and so on until I reach 30 days.
It seems like what I would want to do is create a case for the grouping that includes a loop of some type but I am not sure if this is the best way or if it is even possible to combine them.
An alternative might be create a case statement that creates a new column that assigns group number and then grouping by that number, month, and Year.
Unfortunately I've never used a case statement so I am not sure which method is best or how to execute them especially with a loop.
EDIT: I am using Access so it looks like I will be using IIF instead of Case
Consider the Partition Function and a crosstab, so, for example:
TRANSFORM Sum(Calendar.Order) AS SumOfOrder
SELECT Month([CalDate]) AS TheMonth, Partition(Day([Caldate]),1,31,3) AS DayGroup
FROM Calendar
GROUP BY Month([CalDate]), Partition(Day([Caldate]),1,31,3)
PIVOT Year([CalDate]);
As an aside, I hope you have not named a field / column as Date.
How about the following:
COUNT OF ORDERS
select year([Date]) as yr,
month([Date]) as monthofyr,
sum(iif((day([Date])>=1) and (day([Date])<=3),1,0)) as days1to3,
sum(iif((day([Date])>=4) and (day([Date])<=6),1,0)) as days4to6,
sum(iif((day([Date])>=7) and (day([Date])<=9),1,0)) as days7to9,
sum(iif((day([Date])>=10) and (day([Date])<=12),1,0)) as days10to12,
sum(iif((day([Date])>=13) and (day([Date])<=15),1,0)) as days13to15,
sum(iif((day([Date])>=16) and (day([Date])<=18),1,0)) as days16to18,
sum(iif((day([Date])>=19) and (day([Date])<=21),1,0)) as days19to21,
sum(iif((day([Date])>=22) and (day([Date])<=24),1,0)) as days22to24,
sum(iif((day([Date])>=25) and (day([Date])<=27),1,0)) as days25to27,
sum(iif((day([Date])>=28) and (day([Date])<=31),1,0)) as days28to31
from tbl
where [Date] between x and y
group by year([Date]),
month([Date])
Replace x and y with your date range.
The last group is days 28 to 31 of the month, so it may contain 4 days' worth of orders, for months that have 31 days.
THE ABOVE IS A COUNT OF ORDERS.
If you want the SUM of the order amounts:
SUM OF ORDER AMOUNTS
select year([Date]) as yr,
month([Date]) as monthofyr,
sum(iif((day([Date])>=1) and (day([Date])<=3),order,0)) as days1to3,
sum(iif((day([Date])>=4) and (day([Date])<=6),order,0)) as days4to6,
sum(iif((day([Date])>=7) and (day([Date])<=9),order,0)) as days7to9,
sum(iif((day([Date])>=10) and (day([Date])<=12),order,0)) as days10to12,
sum(iif((day([Date])>=13) and (day([Date])<=15),order,0)) as days13to15,
sum(iif((day([Date])>=16) and (day([Date])<=18),order,0)) as days16to18,
sum(iif((day([Date])>=19) and (day([Date])<=21),order,0)) as days19to21,
sum(iif((day([Date])>=22) and (day([Date])<=24),order,0)) as days22to24,
sum(iif((day([Date])>=25) and (day([Date])<=27),order,0)) as days25to27,
sum(iif((day([Date])>=28) and (day([Date])<=31),order,0)) as days28to31
from tbl
where [Date] between x and y
group by year([Date]),
month([Date])

oracle sql: efficient way to calculate business days in a month

I have a pretty huge table with columns dates, account, amount, etc. eg.
date account amount
4/1/2014 XXXXX1 80
4/1/2014 XXXXX1 20
4/2/2014 XXXXX1 840
4/3/2014 XXXXX1 120
4/1/2014 XXXXX2 130
4/3/2014 XXXXX2 300
...........
(I have 40 months' worth of daily data and multiple accounts.)
The final output I want is the average amount of each account each month. Since there may or may not be record for any account on a single day, and I have a seperate table of holidays from 2011~2014, I am summing up the amount of each account within a month and dividing it by the number of business days of that month. Notice that there is very likely to be record(s) on weekends/holidays, so I need to exclude them from calculation. Also, I want to have a record for each of the date available in the original table. eg.
date account amount
4/1/2014 XXXXX1 48 ((80+20+840+120)/22)
4/2/2014 XXXXX1 48
4/3/2014 XXXXX1 48
4/1/2014 XXXXX2 19 ((130+300)/22)
4/3/2014 XXXXX2 19
...........
(Suppose the above is the only data I have for Apr-2014.)
I am able to do this in a hacky and slow way, but as I need to join this process with other subqueries, I really need to optimize this query. My current code looks like:
<!-- language: lang-sql -->
select
date,
account,
sum(amount/days_mon) over (partition by last_day(date))
from(
select
date,
-- there are more calculation to get the account numbers,
-- so this subquery is necessary
account,
amount,
-- this is a list of month-end dates that the number of
-- business days in that month is 19. similar below.
case when last_day(date) in ('','',...,'') then 19
when last_day(date) in ('','',...,'') then 20
when last_day(date) in ('','',...,'') then 21
when last_day(date) in ('','',...,'') then 22
when last_day(date) in ('','',...,'') then 23
end as days_mon
from mytable tb
inner join lookup_businessday_list busi
on tb.date = busi.date)
So how can I perform the above purpose efficiently? Thank you!
This approach uses sub-query factoring - what other RDBMS flavours call common table expressions. The attraction here is that we can pass the output from one CTE as input to another. Find out more.
The first CTE generates a list of dates in a given month (you can extend this over any range you like).
The second CTE uses an anti-join on the first to filter out dates which are holidays and also dates which aren't weekdays. Note that Day Number varies depending according to the NLS_TERRITORY setting; in my realm the weekend is days 6 and 7 but SQL Fiddle is American so there it is 1 and 7.
with dates as ( select date '2014-04-01' + ( level - 1) as d
from dual
connect by level <= 30 )
, bdays as ( select d
, count(d) over () tot_d
from dates
left join holidays
on dates.d = holidays.hol_date
where holidays.hol_date is null
and to_number(to_char(dates.d, 'D')) between 2 and 6
)
select yt.account
, yt.txn_date
, sum(yt.amount) over (partition by yt.account, trunc(yt.txn_date,'MM'))
/tot_d as avg_amt
from your_table yt
join bdays
on bdays.d = yt.txn_date
order by yt.account
, yt.txn_date
/
I haven't rounded the average amount.
You have 40 month of data, this data should be very stable.
I will assume that you have a cold body (big and stable easily definable range of data) and hot tail (small and active part).
Next, I would like to define a minimal period. It is a data range that is a smallest interval interesting for Business.
It might be year, month, day, hour, etc. Do you expect to get questions like "what was averege for that account between 1900 and 12am yesterday?".
I will assume that the answer is DAY.
Then,
I will calculate sum(amount) and count() for every account for every DAY of cold body.
I will not create a dummy records, if particular account had no activity on some day.
and I will save day, account, total amount, count in a TABLE.
if there are modifications later to the cold body, you delete and reload affected day from that table.
For hot tail there might be multiple strategies:
Do the same as above (same process, clear to support)
always calculate on a fly
use materialized view as an averege between 1 and 2.
Cold body table totalc could also be implemented as materialized view, but if data never change - no need to rebuild it.
With this you go from (number of account) x (number of transactions per day) x (number of days) to (number of account)x(number of active days) number of records.
That should speed up all following calculations.

Query assistance please

Given the following table (much simplified for the purposes of this question):
id perPeriod actuals createdDate
---------------------------------------------------------
1 14 22 2011-10-04 00:00:00.000
2 14 9 2011-10-04 00:00:00.000
3 14 3 2011-10-03 00:00:00.000
4 14 5 2011-10-03 00:00:00.000
I need a query that gives me the average daily "actuals" figure. Note, however, that there are TWO RECORDS PER DAY (often more), so I can't just do AVG(actuals).
Also, if the daily "actuals" average exceeds the daily "perPeriod" average, I want to take the perPeriod value instead of the "average" value. Thus, in the case of the first two records: The actuals average for 4th October is (22+9) / 2 = 15.5. And the perPeriod average for the same day is (14 + 14) / 2 = 14. Now, 15.5 is greater than 14, so the daily "actuals" average for that day should be the "perPeriod" average.
Hope that makes sense. Any pointers greatly appreciated.
EDIT
I need an overall daily average, not an average per date. As I said, I would love to just do AVG(actuals) on the entire table, but the complicating factor is that a particular day can occupy more than one row, which would skew the results.
Is this what you want?
First, if the second payperiod average needed to be the average across a different grouping (It doesn't in this case), then you would need to use a subquery like this:
Select t.CreatedDate,
Case When Avg(actuals) < p.PayPeriodAvg
Then Avg(actuals) Else p.PayPeriodAvg End Average
From table1 t Join
(Select CreatedDate, Avg(PayPeriod) PayPeriodAvg
From table1
Group By CreatedDate) as p
On p.CreatedDate = t.CreatedDate
Group By t.CreatedDate, p.PayPeriodAvg
or, in this case, since the PayPeriod Average is grouped on the same thing, (CreatedDate) as the actuals average, you don't need a subquery, so even easier:
Select t.CreatedDate,
Case When Avg(actuals) < Avg(PayPeriod)
Then Avg(actuals) Else Avg(PayPeriod) End Average
From table1 t
Group By t.CreatedDate
with your sample data, both of these return
CreatedDate Average
----------------------- -----------
2011-10-03 00:00:00.000 4
2011-10-04 00:00:00.000 14
SELECT DAY(createdDate), MONTH(createdDate), YEAR(createdDate), MIN(AVG(actuals), MAX(perPeriod))
FROM MyTable
GROUP BY Day(createdDate, MONTH(createdDate), YEAR(createdDate)
Try this out:
select createdDate,
case
when AVG(actuals) > max(perPeriod) then max(perPeriod)
else AVG(actuals)
end
from SomeTestTable
group by createdDate