Oracle: I need to select n rows from every k rows of a table - sql

For example:
My table has 10000 rows. First I will divide it in 5 sets of 2000(k) rows. Then from each set of 2000 rows I will select only top 100(n) rows.
With this approach I am trying to scan some rows of table with a specific pattern.

Assuming you are ordering them 1 - 10000 using some logic and want to output only rows 1-100,2001-2100,4001-4100,etc then you can use the ROWNUM pseudocolumn:
SELECT *
FROM (
SELECT t.*,
ROWNUM AS rn -- Secondly, assign a row number to the ordered rows
FROM (
SELECT *
FROM your_table
ORDER BY your_condition -- First, order the data
) t
)
WHERE MOD( rn - 1, 2000 ) < 100; -- Finally, filter the top 100 per 2000.
Or you could use the ROW_NUMBER() analytic function:
SELECT *
FROM (
SELECT t.*,
ROW_NUMBER() OVER ( ORDER BY your_condition ) AS rn
FROM your_table
)
WHERE MOD( rn - 1, 2000 ) < 100;
Is it possible to increase the set of sample data exponentially. Like 1k, 2k, 4k,8k....and then fetch some rows from these.
Replace the WHERE clause with:
WHERE rn - POWER(
2,
TRUNC( CAST( LOG( 2, CEIL( rn / 1000 ) ) AS NUMBER(20,4) ) )
) * 1000 + 1000 <= 100

This solution uses the analytic ntile() to split the raw data into five buckets. That result set is labelled using the analytic row_number() which provides a filter to produce the final set:
with sq1 as ( select id, col1, ntile(5) over (order by id asc) as quintile
from t23
)
, sq2 as ( select id, col1, quintile
, row_number() over ( partition by quintile order by id asc) as rn
from sq1 )
select *
from sq2
where rn <= 200
order by quintile, rn
/

use partition by and order by with row_number. it will look like following:
row_number()over(partition by partition_column order by order_column)<=100
partition_column will be your condition to divide set.
order_column will be your condition to select top 100.

Related

Random records in Oracle table based on conditions

I have a Oracle table with the following columns
Table Structure
In a query I need to return all the records with CPER>=40 which is trivial. However, apart from CPER>=40 I need to list 5 random records for each CPID.
I have attached a sample list of records. However, in my table I have around 50,000 records.
Appreciate if you can help.
Oracle solution:
with CTE as
(
select t1.*,
row_number() over(order by DBMS_RANDOM.VALUE) as rn -- random order assigned
from MyTable t1
where CPID <40
)
select *
from CTE
where rn <=5 -- pick 5 at random
union all
select t2.*, null
from my_table t2
where CPID >= 40
SQL Server:
with CTE as
(
select t1.*,
row_number() over(order by newid()) as rn -- random order assigned
from MyTable t1
where CPID <40
)
select *
from CTE
where rn <=5 -- pick 5 at random
union all
select t2.*, null
from my_table t2
where CPID >= 40
How about something like this...
SELECT *
FROM (SELECT CID,
CVAL,
CPID,
CPER,
Row_number() OVER (partition BY CPID ORDER BY CPID ASC ) AS RN
FROM Table) tmp
WHERE CPER>=40 OR pids <= 5
However, this is not random.
Assuming that you want five additional random records, you can do:
select t.*
from (select t.*,
row_number() over (partition by cpid,
(case when cper >= 40 then 1 else 2 end)
order by dbms_random.value
) as seqnum
from t
) t
where seqnum <= 5 or cper >= 40;
The row_number() is enumerating the rows for each cpid in two groups -- based on the cper value. The outer where is taking all cper values in the range you want as well as five from the other group.

Returning the results of multiple 'WITH CTE' queries as one result

I am trying to select the highest price from the same product over n periods of time i.e. last 5, 50, 100, 500.
At the moment I'm running the query four times for above periods like this:
;WITH CTE AS
(
SELECT TOP (500) * FROM Ticker WHERE ProductId='BTC-USD'
ORDER BY ID DESC
) SELECT TOP (1) * FROM CTE
ORDER BY PRICE desc
Is there a way I can get all the results at once in 4 rows?
Hmmmm . . . My first thought is a union all:
with cte as (
select top (500) t.*, row_number() over (order by id desc) as seqnum
from Ticker t
where ProductId = 'BTC-USD'
order by id desc
)
select 500 as which, max(cte.price) as max_price from cte where seqnum <= 500 union all
select 100 as which, max(cte.price) from cte where seqnum <= 100 union all
select 50 as which, max(cte.price) from cte where seqnum <= 50 union all
select 5 as which, max(cte.price) from cte where seqnum <= 5;
But, I have another idea:
with cte as (
select top (500) t.*, row_number() over (order by id desc) as seqnum
from Ticker t
where ProductId = 'BTC-USD'
order by id desc
)
select v.which, x.max_price
from (values (5), (50), (100), (500)) v(which) cross apply
(select max(price) as max_price from cte where seqnum <= which) x;
Of course, the "500" in the CTE needs to match the maximum value in v. You can actually get rid of the TOP in the CTE.

Applying LIMIT and OFFSET to MS SQL server 2008 queries

I need to apply LIMIT and OFFSET to original query (without modifying it) in MSSQL server 2008.
Let's say the original query is:
SELECT * FROM energy_usage
(But it can be any arbitrary SELECT query)
That's what I came up with so far:
1. It does what I need, but the query generates extra column row_number which I don't need.
WITH OrderedTable AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS row_number, * FROM energy_usage
)
SELECT * FROM OrderedTable WHERE row_number BETWEEN 1 AND 10
2. This one doesn't work for some reason and returns the following error.
SELECT real_sql.* FROM (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS row_number, * FROM (SELECT * FROM energy_usage) as real_sql) as subquery
WHERE row_number BETWEEN 1 AND 10
More common case is:
SELECT real_sql.* FROM (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS row_number, * FROM (real sql query) as real_sql) as subquery
WHERE row_number BETWEEN {offset} + 1 AND {limit} + {offset}
Error:
The column prefix 'real_sql' does not match with a table name or alias
name used in the query.
Simply do not put it on SELECT list:
WITH OrderedTable AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS row_number, *
FROM energy_usage
)
SELECT col1, col2, col3 FROM OrderedTable WHERE row_number BETWEEN 1 AND 10;
SELECT * is common anti-pattern and should be avoided anyway. Plus ORDER BY (SELECT 1) will not give you guarantee of stable sort between executions.
Second if you need only ten rows use:
SELECT TOP 10 *
FROM energy_usage
ORDER BY ...
Unfortunately you won't get something nice as Selecting all Columns Except One
WITH OrderedTable AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS row_number, *
FROM energy_usage
)
SELECT * EXCEPT row_number FROM OrderedTable WHERE row_number BETWEEN 1 AND 10;
This would solve the problem.
DECLARE #offset INT = 1;
DECLARE #limit INT = 10;
WITH Filtered AS (
SELECT TOP (#offset + #limit) *
FROM energy_usage
ORDER BY 1 ASC
), Results AS (
SELECT TOP (#limit) *
FROM Filtered
ORDER BY 1 DESC
)
SELECT *
FROM Results
ORDER BY 1 ASC;

Oracle: Need to fetch the rows exponentially

I got below query from another post which selects 100 rows from every 2000 rows.
Like this: 1-100,2001-2100,4001-4100,6001-6100,8001-8100 and so on.
SELECT * FROM (SELECT t.*,ROWNUM AS rn FROM(SELECT * FROM your_table ORDER BY your_condition) t)WHERE MOD( rn - 1, 2000 ) < 100;
Now I want to select my data exponentially.Such that it will select 100 rows from first 1000 rows, then from next 2000 rows, then from next 4000 rows.
Like this: 1-100,2000-2100,4000-4100,8000-8100,16000-16100 and so on.
The idea is to scan rows with a specific pattern.
You asked this in a comment on your previous question and I answered there...
SELECT *
FROM (
SELECT t.*,
ROWNUM AS rn -- Secondly, assign a row number to the ordered rows
FROM (
SELECT *
FROM your_table
ORDER BY your_condition -- First, order the data
) t
)
WHERE rn - POWER( -- Finally, filter the top 100.
2,
TRUNC( CAST( LOG( 2, CEIL( rn / 1000 ) ) AS NUMBER(20,4) ) )
) * 1000 + 1000 <= 100
This will take the first 100 rows from the groups 1-1000, 1001-3000, 3001-7000, 7001-15000, etc.
Or, to get the rows:
1-100,2000-2100,4000-4100,8000-8100,16000-16100, 32000-32100 and so on.
Then:
WHERE CASE -- Finally, filter the top 100.
WHEN rn <= 2000 THEN rn
ELSE rn - POWER(
2,
TRUNC( CAST( LOG( 2, CEIL( rn / 1000 - 1 ) ) AS NUMBER(20,4) ) )
) * 1000
END <= 100
You could use power function and simple hierarchical query, then join it with your table. Here is example with all_objects view:
with rng as (select 0 num from dual union all
select 1000 * power(2, level) from dual connect by level < 10 )
select *
from (select row_number() over (order by object_name) rn, object_name from all_objects)
join rng on rn between num + 1 and num + 100
From what you describe, you can use logs to define the groups. This is probably close enough to what you want:
select t.*
from (select t.*,
row_number() over (floor(log(2, floor(1 + (seqnum - 1) / 1000) ))
order by col
) as seqnum_2
from (select t.*, row_number() over (order by col) as seqnum
from t
) t
where seqnum_2 <= 100;
The difference from your description is that the first group is 1-999, 1000-1999, and so on.

Compute median of column in SQL common table expression

In MSSQL2008, I am trying to compute the median of a column of numbers from a common table expression using the classic median query as follows:
WITH cte AS
(
SELECT number
FROM table
)
SELECT cte.*,
(SELECT
(SELECT (
(SELECT TOP 1 cte.number
FROM
(SELECT TOP 50 PERCENT cte.number
FROM cte
ORDER BY cte.number) AS medianSubquery1
ORDER BY cte.number DESC)
+
(SELECT TOP 1 cte.number
FROM
(SELECT TOP 50 PERCENT cte.number
FROM cte
ORDER BY cte.number DESC) AS medianSubquery2
ORDER BY cte.number ASC) ) / 2)) AS median
FROM cte
ORDER BY cte.number
The result set that I get is the following:
NUMBER MEDIAN
x1 x1
x1 x1
x1 x1
x2 x2
x3 x3
In other words, the "median" column is the same as the "number" column when I would expect the median column to be "x1" all the way down. I use a similar expression to compute the mode and it works fine over the same common table expression.
Here's a slightly different way to do it:
WITH cte AS
(
SELECT number
FROM table1
)
SELECT T1.number, T3.median
FROM cte T1,
(
SELECT AVG(number) AS median
FROM
(
SELECT number, ROW_NUMBER() OVER(ORDER BY number) AS rn
FROM cte
) T2
WHERE T2.rn = ((SELECT COUNT(*) FROM table1) + 1) / 2
OR T2.rn = ((SELECT COUNT(*) FROM table1) + 2) / 2
) T3
The problem with your query is that you are doing
SELECT TOP 1 cte.number FROM...
but it isn't correlated with the sub query it is correlated with the Outer query so the subquery is irrelevant. Which explains why you simply end up with the same value all the way down. Removing the cte. (as below) gives the median of the CTE. Which is a constant value. What are you trying to do?
WITH cte AS
( SELECT NUMBER
FROM master.dbo.spt_values
WHERE TYPE='p'
)
SELECT cte.*,
(SELECT
(SELECT (
(SELECT TOP 1 number
FROM
(SELECT TOP 50 PERCENT cte.number
FROM cte
ORDER BY cte.number) AS medianSubquery1
ORDER BY number DESC)
+
(SELECT TOP 1 number
FROM
(SELECT TOP 50 PERCENT cte.number
FROM cte
ORDER BY cte.number DESC) AS medianSubquery2
ORDER BY number ASC) ) / 2)) AS median
FROM cte
ORDER BY cte.number
Returns
NUMBER median
----------- -----------
0 1023
1 1023
2 1023
3 1023
4 1023
5 1023
6 1023
7 1023
This is not an entirely new answer as it mostly expands on Mark Byer's answer, but there are a couple of options for simplifying the query even further.
The first thing is to really make use of CTE's. Not only can you have multiple CTE's, but they can refer to each other. With this in mind, we can create an additional CTE to compute the median based on the results of the first. This encapsulates the median computation and leaves the actual SELECT to do only what it needs to do. Note that the ROW_NUMBER() had to be moved into the first CTE.
;WITH cte AS
(
SELECT number, ROW_NUMBER() OVER(ORDER BY number) AS rn
FROM table1
),
med AS
(
SELECT AVG(number) AS median
FROM cte
WHERE cte.rn = ((SELECT COUNT(*) FROM cte) + 1) / 2
OR cte.rn = ((SELECT COUNT(*) FROM cte) + 2) / 2
)
SELECT cte.number, med.median
FROM cte
CROSS JOIN med
And to further reduce complexity, you "could" use a custom CLR Aggregate to handle the Median (such as the one provided in the free SQL# library at http://www.SQLsharp.com/ [which I am the author of]).
;WITH cte AS
(
SELECT number
FROM table1
),
med AS
(
SELECT SQL#.Agg_Median(cte.number) AS median
FROM cte
)
SELECT cte.number, med.median
FROM cte
CROSS JOIN med