how to use median as a analytic function (oracle SQL) - sql

Can you explain why the following works:
select recdate,avg(logtime)
over
(ORDER BY recdate rows between 10 preceding and 0 following) as logtime
from v_download_times;
and the following doesn’t
select recdate,median(logtime)
over
(ORDER BY recdate rows between 10 preceding and 0 following) as logtime
from v_download_times;
(median instead of avg)
I get an ORA-30487 error.
and I would be grateful for a workaround.

The error message is ORA-30487: ORDER BY not allowed here. And sure enough, if we consult the documentation for the MEDIAN function it says:
"You can use MEDIAN as an analytic function. You can specify only the
query_partition_clause in its OVER clause."

But it is not redundant if you only want to take it from a certain number of rows preceding the current one.
A way around may be limiting your data set just for the median purpose, like
select
median(field) over (partition by field2)
from ( select * from dataset
where period_back between 0 and 2 )

MEDIAN doesn't allow an ORDER BY clause. As APC points out in his answer, the documentation tells us we can only specify the query_partition_clause.
ORDER BY is redundant as we're looking for the central value -- it's the same regardless of order.

Related

How to get correct cumulative balance?

I have this simple query
select ArticleID, Prix, Qte, InfStock
, SUM(Qte*InfStock) OVER (Partition BY ArticleID ORDER BY DateDocument) AS CUMUL
FROM Balance
Please look the result (Line 4)
Here is the backup file(zipped)
You need to add to the end of the OVER clause ROWS UNBOUNDED PRECEDING.
SUM defaults to RANGE UNBOUNDED PRECEDING which can cause issues like this.
See here for example, for further explanation.

Is there a way to calculate percentile using percentile_cont() function over a rolling window in Big Query?

I have a dataset with the following columns
city
user
week
month
earnings
Ideally I want to calculate a 50th % from percentile_cont(earnings,0.5) over (partition by city order by month range between 1 preceding and current row). But Big query doesn't support window framing in percentile_cont. Can anyone please help me if there is a work around this problem.
If I understand correctly, you can aggregate into an array and then unnest:
select t.*,
(select percentile_cont(earning) over ()
from unnest(ar_earnings) earning
limit 1
) as median_2months
from (select t.*,
array_agg(earnings) over (partition by city
order by month
range between 1 preceding and current month
) as ar_earnings
from t
) t;
You don't provide sample data, but this version assumes that month is an incrementing integer that represents the month. You may need to adjust the range depending on the type.

How do I group a set of entities (people) into 10 equal portions but based on their usage for EX using Oracle's Toad 11g (SQL)

Hi I have a list of 2+ mil people and their usage put in order from largest to smallest.
I tried ranking using row_number () over (partition by user column order by usage desc) as rnk
but that didnt work ..the results were crazy.
Simply put, I just want 10 equal groups of 10 with the first group consisting of the highest usage in the order of which i had first listed them.
HELP!
You can use ntile():
select t.*, ntile(10) over (order by usage desc) as usage_decile
from t;
The only caveat: This will divide the data into exactly 10 equal sized groups. If usage values have duplicates, then users with the same usage will be in different deciles.
If you don't want that behavior, use a more manual calculation:
select t.*,
ceil(rank() over (order by usage desc) * 10 /
count(*) over ()
) as usage_decile
from t;

Why Window Functions Require My Aggregated Column in Group

I have been working with window functions a fair amount but I don't think I understand enough about how they work to answer why they behave the way they do.
For the query that I was working on (below), why am I required to take my aggregated field and add it to the group by? (In the second half of my query below I am unable to produce a result if I don't include "Events" in my second group by)
With Data as (
Select
CohortDate as month
,datediff(week,CohortDate,EventDate) as EventAge
,count(distinct case when EventDate is not null then GUID end) as Events
From MyTable
where month >= [getdate():month] - interval '12 months'
group by 1, 2
order by 1, 2
)
Select
month
,EventAge
,sum(Events) over (partition by month order by SubAge asc rows between unbounded preceding and current row) as TotEvents
from data
group by 1, 2, Events
order by 1, 2
I have run into this enough that I have just taken it for granted, but would really love some more color as to why this is needed. Is there a way I should be formatting these differently in order to avoid this (somewhat non-intuitive) requirement?
Thanks a ton!
What you are looking for is presumably a cumulative sum. That would be:
select month, EventAge,
sum(sum(Events)) over (partition by month
order by SubAge asc
rows between unbounded preceding and current row
) as TotEvents
from data
group by 1, 2
order by 1, 2 ;
Why? That might be a little hard to explain. Perhaps if you see the equivalent version with a subquery it will be clearer:
select me.*
sum(sum_events) over (partition by month
order by SubAge asc
rows between unbounded preceding and current row
) as TotEvents
from (select month, EventAge, sum(events) as sum_events
from data
group by 1, 2
) me
order by 1, 2 ;
This is pretty much an exactly shorthand for the query. The window function is evaluated after aggregation. You want to sum the SUM of the events after the aggregation. Hence, you need sum(sum(events)). After the aggregation, events is no longer available.
The nesting of aggregation functions is awkward at first -- at least it was for me. When I first started using window functions, I think I first spent a few days writing aggregation queries using subqueries and then rewriting without the subqueries. Quickly, I got used to writing them without subqueries.

SQL Server Lag function adding range

Hi I am a newbie when it comes to SQL and was hoping someone can help me in this matter. I've been using the lag function here and there but was wondering if there is a way to rewrite it to make it into a sum range. So instead of prior one month, i want to take the prior 12 months and sum them together for each period. I don't want to write 12 lines of lag but was wondering if there is a way to get it with less lines of code. Note there will be nulls and if one of the 12 records is null then it should be null.
I know you can write write subquery to do this, but was wondering if this is possible. Any help would be much appreciated.
You want the "window frame" part of the window function. A moving 12-month average would look like:
select t.*,
sum(balance) over (order by period rows between 11 preceding and current row) as moving_sum_12
from t;
You can review window frames in the documentation.
If you want a cumulative sum, you can leave out the window frame entirely.
I should note that you can also do this using lag(), but it is much more complicated:
select t.*,
(balance +
lag(balance, 1, 0) over (order by period) +
lag(balance, 2, 0) over (order by period) +
. . .
lag(balance, 11, 0) over (order by period) +
) as sum_1112
from t;
This uses the little known third argument to lag(), which is the default value to use if the record is not available. It replaces a coalesce().
EDIT:
If you want NULL if 12 values are not available, then use case and count() as well:
select t.*,
(case when count(*) over (order by period rows between 11 preceding and current row) = 12
then sum(balance) over (order by period rows between 11 preceding and current row)
end) as moving_sum_12
from t;