COUNT(*) of a windowed SQL range - sql

In Postgres 9.1 I'm using a windowing function like so:
SELECT a.category_id, (dense_rank() over w) - 1
FROM (
_inner select_
) a
WINDOW w AS (PARTITION BY category_id ORDER BY score)
What I can't figure out is how to also select the total number of elements in the windowed range. If I just use count(*) over w that tells me how many elements I've seen in the window so far instead of the total number in the window.
My core issue here is that cume_dist() counts from 1, not 0, for the number of rows before or equal to you. percentile_rank() counts from 0, like I need, but then it also subtracts 1 from the total number of rows when it does it's division.

SELECT
a.category_id,
(dense_rank() over w) - 1,
count(*) over (partition by category_id) --without order by
FROM (
_inner select_
) a
WINDOW w AS (PARTITION BY category_id ORDER BY score)
From the manual on Window Functions:
There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Many (but not all) window functions act only on the rows of the window frame, rather than of the whole partition. By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition

Related

postgresql how to get min and max value from specific range

I have this table of log finger print attendannce.
and i want to select the table with the result like this.
as you can see, in the mindailylog field, it contains the minumum date of that day, and the maxdailylog field value contains the max value of that day.
You can use window functions:
select t.*,
min(waktuabsen) over (partition by userid, waktuabsen::date),
max(waktuabsen) over (partition by userid, waktuabsen::date),
from t;
This does the minimum per user_id. If you want the overall minimum, just remove userid from the partition by.

How do I group a set of entities (people) into 10 equal portions but based on their usage for EX using Oracle's Toad 11g (SQL)

Hi I have a list of 2+ mil people and their usage put in order from largest to smallest.
I tried ranking using row_number () over (partition by user column order by usage desc) as rnk
but that didnt work ..the results were crazy.
Simply put, I just want 10 equal groups of 10 with the first group consisting of the highest usage in the order of which i had first listed them.
HELP!
You can use ntile():
select t.*, ntile(10) over (order by usage desc) as usage_decile
from t;
The only caveat: This will divide the data into exactly 10 equal sized groups. If usage values have duplicates, then users with the same usage will be in different deciles.
If you don't want that behavior, use a more manual calculation:
select t.*,
ceil(rank() over (order by usage desc) * 10 /
count(*) over ()
) as usage_decile
from t;

SQL Server Lag function adding range

Hi I am a newbie when it comes to SQL and was hoping someone can help me in this matter. I've been using the lag function here and there but was wondering if there is a way to rewrite it to make it into a sum range. So instead of prior one month, i want to take the prior 12 months and sum them together for each period. I don't want to write 12 lines of lag but was wondering if there is a way to get it with less lines of code. Note there will be nulls and if one of the 12 records is null then it should be null.
I know you can write write subquery to do this, but was wondering if this is possible. Any help would be much appreciated.
You want the "window frame" part of the window function. A moving 12-month average would look like:
select t.*,
sum(balance) over (order by period rows between 11 preceding and current row) as moving_sum_12
from t;
You can review window frames in the documentation.
If you want a cumulative sum, you can leave out the window frame entirely.
I should note that you can also do this using lag(), but it is much more complicated:
select t.*,
(balance +
lag(balance, 1, 0) over (order by period) +
lag(balance, 2, 0) over (order by period) +
. . .
lag(balance, 11, 0) over (order by period) +
) as sum_1112
from t;
This uses the little known third argument to lag(), which is the default value to use if the record is not available. It replaces a coalesce().
EDIT:
If you want NULL if 12 values are not available, then use case and count() as well:
select t.*,
(case when count(*) over (order by period rows between 11 preceding and current row) = 12
then sum(balance) over (order by period rows between 11 preceding and current row)
end) as moving_sum_12
from t;

BigQuery - counting number of events within a sliding time frame

I would like to count the number of events within a sliding time frame.
For example, say I would like to know how many bids were in the last 1000 seconds for the Google stock (GOOG).
I'm trying the following query:
SELECT
symbol,
start_date,
start_time,
bid_price,
count(if(max(start_time)-start_time<1000,1,null)) over (partition by symbol order by start_time asc) cnt
FROM [bigquery-samples:nasdaq_stock_quotes.quotes]
where symbol = 'GOOG'
The logic is as follow: the partition window (by symbol) is ordered with the bid time (leaving alone the bid date for sake of simplicity).
For each window (defined by the row at the "head" of the window) I would like to count the number of rows which have start_time that is less than 1000 seconds than the "head" row time.
I'm trying to use max(start_time) to get the top row in the window. This doesn't seem to work and I get an error:
Error: MAX is an analytic function and must be accompanied by an OVER clause.
Is it possible to have two an analytic functions in one column (both count and max in this case)? Is there a different solution to the problem presented?
Try using the range function.
SELECT
symbol,
start_date,
start_time,
bid_price,
count(market_center) over (partition by symbol order by start_time RANGE 1000 PRECEDING) cnt
FROM [bigquery-samples:nasdaq_stock_quotes.quotes]
where symbol = 'GOOG'
order by 2, 3
I used market_center just as a counter, additional fields can be used as well.
Note: the RANGE function is not documented in BigQuery Query Reference, however it's a standard SQL function which appears to work in this case

Filtering by window function result in Postgresql

Ok, initially this was just a joke we had with a friend of mine, but it turned into interesting technical question :)
I have the following stuff table:
CREATE TABLE stuff
(
id serial PRIMARY KEY,
volume integer NOT NULL DEFAULT 0,
priority smallint NOT NULL DEFAULT 0,
);
The table contains the records for all of my stuff, with respective volume and priority (how much I need it).
I have a bag with specified volume, say 1000. I want to select from the table all stuff I can put into a bag, packing the most important stuff first.
This seems like the case for using window functions, so here is the query I came up with:
select s.*, sum(volume) OVER previous_rows as total
from stuff s
where total < 1000
WINDOW previous_rows as
(ORDER BY priority desc ROWS between UNBOUNDED PRECEDING and CURRENT ROW)
order by priority desc
The problem with it, however, is that Postgres complains:
ERROR: column "total" does not exist
LINE 3: where total < 1000
If I remove this filter, total column gets properly calculated, results properly sorted but all stuff gets selected, which is not what I want.
So, how do I do this? How do I select only items that can fit into the bag?
I don't know if this qualifies as "more elegant" but it is written in a different manner than Cybernate's solution (although it is essentially the same)
WITH window_table AS
(
SELECT s.*,
sum(volume) OVER previous_rows as total
FROM stuff s
WINDOW previous_rows as
(ORDER BY priority desc ROWS between UNBOUNDED PRECEDING and CURRENT ROW)
)
SELECT *
FROM window_table
WHERE total < 1000
ORDER BY priority DESC
If by "more elegant" you mean something that avoids the sub-select, then the answer is "no"
I haven't worked with PostgreSQL. However, my best guess would be using an inline view.
SELECT a.*
FROM (
SELECT s.*, sum(volume) OVER previous_rows AS total
FROM stuff AS s
WINDOW previous_rows AS (
ORDER BY priority desc
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
ORDER BY priority DESC
) AS a
WHERE a.total < 1000;