generate_series() equivalent in DB2

generate_series() equivalent in DB2 - sql

I'm trying to search the DB2 equivalent of generate_series() (the PostgreSQL-way of generating rows). I obviously don't want to hard-code the rows with a VALUES statement.
select * from generate_series(2,4);
generate_series
-----------------
2
3
4
(3 rows)

The where clause needs to be a bit more explicit about the bounds of the recursion in order for DB2 to suppress the warning. Here's a slightly adjusted version that does not trigger the warning:
with dummy(id) as (
select 2 from SYSIBM.SYSDUMMY1
union all
select id + 1 from dummy where id < 4
)
select id from dummy

I managed to write a recursive query that fits :
with dummy(id) as (
select 2 from SYSIBM.SYSDUMMY1
union all
select id + 1 from dummy where id + 1 between 2 and 4
)
select id from dummy
The query can be adapted to whatever for(;;) you can dream of.

Related

How to limit number of groups returned in a query, but not the number of rows in Oracle

How to limit the number of groups in a query, but not the number of rows in Oracle?
If I had to do that manually, I would have to use a DISTINCT.
Would be something like this:
FOR d IN (
SELECT DISTINCT COLUMN_1 FROM myTable
WHERE myDate BETWEEN x AND y
OFFSET o ROWS
FETCH NEXT l ROWS ONLY
) LOOP
And then, do the selects from each of the ids returned in the query, which, in my opinion, is a terrible solution.
SAMPLE DATA:
If I limit the number of groups to 2 by using COLUMN_2, the expected result should be something like:

I believe you may be looking for something like this:
select *
from mytable
where id in (
select distinct id
from my_table
where my_date between x and y
fetch first :n rows only
)
;
:n is a bind variable, encoding the number of groups you want to select.
This should be more efficient than solutions using analytic functions - even if it must read the base table twice. In tests posted on OTN, I showed that the difference is not small.
EDIT If I remember correctly, FETCH is not implemented in the most efficient way (perhaps for good reasons, having to do with features we don't need in this query - such as how to deal with ties). FETCH itself resembles a DENSE_RANK() implementation rather than the faster row limiting clause (using ROWNUM). I would likely need to modify the query to do away with FETCH, if speed was really important. END EDIT
Further edit to do with performance comparisons
Frequent poster MT0 requested a pointer for the claim that aggregate solutions can (and often are) more efficient than analytic function approaches, even when the former may require multiple passes through the data where the analytic function approach requires only one.
Alas, OTN (what now calls itself the "Oracle Groundbreakers Developer Community", the discussion board hosted by Oracle itself) went through a massive - and massively botched - platform change at the end of September 2020; that messed up both the search facilities and the formatting of old posts, to the point of rendering them almost unusable.
Instead, I will show here a simple mock-up of the OP's problem in this thread; code that anyone can run so they can repeat the tests on their own machine.
I created a table with two columns, ID and STR - the ID plays the same role as in the OP's question, and STR is just extra payload to mimic real-life data. ID is number and STR is varchar2(100). I populated the table with 9 million rows - 1 million ID's, nine rows for each ID. The task is to select just three "groups" (three distinct ID's, then select all the rows from the base table for those three distinct ID's).
With no index on the ID column, the aggregate solution runs in 0.81 seconds on my machine; with an index on ID, it runs in 0.47 seconds. The analytic functions solution runs in 0.91 seconds, with or without an index (obviously - there is no way an index can benefit the analytic function solution). All these results are for column ID not declared NOT NULL.
Here is the code to create the table, the index on ID, and the two queries I tested. Note: As I explained in my first edit (above), fetch is slow; I replaced it with a standard row-limiting technique using ROWNUM in an over-query.
drop table t purge;
create table t (id number, str varchar2(100));
insert into t
with row_gen as (select level from dual connect by level <= 3000)
select mod(344227 * rownum, 1000000), rpad('x', 100, 'x')
from row_gen cross join row_gen
;
commit;
create index t_idx on t(id);
select *
from t
where id in (
select id from (select distinct id from t)
where rownum <= 3
);
select *
from ( select t.*, dense_rank() over (order by id) dr from t )
where dr <= 3;

You can use DENSE_RANK:
SELECT *
FROM (
SELECT t.*,
DENSE_RANK() OVER ( ORDER BY column2 ) AS rnk
FROM table_name t
)
WHERE rnk <= 2;
Which, for the sample data:
CREATE TABLE table_name ( column1, column2, column3, column4 ) AS
SELECT 1, 1, 1.0, 1.0 FROM DUAL UNION ALL
SELECT 2, 2, 2.0, 2.0 FROM DUAL UNION ALL
SELECT 2, 2, 2.2, 2.1 FROM DUAL UNION ALL
SELECT 2, 2, 2.2, 2.2 FROM DUAL UNION ALL
SELECT 2, 2, 2.0, 2.3 FROM DUAL UNION ALL
SELECT 3, 3, 3.0, 3.1 FROM DUAL UNION ALL
SELECT 3, 3, 3.1, 3.1 FROM DUAL UNION ALL
SELECT 3, 3, 3.1, 3.1 FROM DUAL UNION ALL
SELECT 4, 4, 4.2, 4.0 FROM DUAL;
Outputs:
COLUMN1 | COLUMN2 | COLUMN3 | COLUMN4 | RNK
------: | ------: | ------: | ------: | --:
1 | 1 | 1 | 1 | 1
2 | 2 | 2 | 2 | 2
2 | 2 | 2.2 | 2.1 | 2
2 | 2 | 2.2 | 2.2 | 2
2 | 2 | 2 | 2.3 | 2
(and, if you want DISTINCT rows then add DISTINCT to the outer query)
db<>fiddle here

If I understand correctly, you want ROW_NUMBER():
SELECT t.*
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) as seqnum
FROM myTable t
WHERE t.myDate BETWEEN x AND y
) t
WHERE seqnum = 1;
This returns an arbitrary row for each id meeting the conditions.

Rewrite a query with "LEVEL" in snowflake

How can I write this SQL query into SNOWFLAKE?
SELECT LEVEL lv FROM DUAL CONNECT BY LEVEL <= 3;

You can find some good starting points by using CONNECT BY (https://docs.snowflake.com/en/sql-reference/constructs/connect-by.html) and here (https://docs.snowflake.com/en/user-guide/queries-hierarchical.html).
Snowflake is also supporting recursive CTEs.

It seems like you want to duplicate the rows three times. For a fixed, small multiplier such as 3, you could just enumerate the numbers:
select c.lv, a.*
from abc a
cross join (select 1 lv union all select 2 union all select 3) c
A more generic approach, in the spirit of the original query, uses a standard recursive common-table-expression to generate the numbers:
with cte as (
select 1 lv
union all select lv + 1 from cte where lv <= 3
)
select c.lv, a.*
from abc a
cross join cte c

There is level in snowflake. The differences from Oracle are:
In snowflake it's neccesary to use prior with connect by expression.
And you can't just select level - there should be any existing column in the select statement.
Example:
SELECT LEVEL, dummy FROM
(select 'X' dummy ) DUAL
CONNECT BY prior LEVEL <= 3;
LEVEL DUMMY
1 X
2 X
3 X
4 X

As per #Danil Suhomlinov post, we can further simplify using COLUMN1 for special table dual single column:
SELECT LEVEL, column1 FROM dual
CONNECT BY prior LEVEL <= 3;
LEVEL | COLUMN1
------+--------
1 |
2 |
3 |
4 |

Execute both the query and the COUNT of the query

I'm trying to build a query or a PL/SQL code to both execute a query, as well as return the number of results of this query.
Is this possible in a single query? Right now I feel like I'm being very wasteful: I first wrap the query in a COUNT (without ORDER BY) and then run the same query again (without the COUNT). A difference of a few seconds will probably not change the total number of rows, I can live with that.
The DB I'm using is Oracle Enterprise 12.2

An easy SQL way:
a test table:
create table testTable(a, b, c) as (
select 1, 'one', 'XX' from dual UNION ALL
select 2, 'two', 'YY' from dual UNION ALL
select 3, 'three', 'ZZ' from dual
)
A simple query:
select a, b, c
from testTable
A B C
---------- ----- --
1 one XX
2 two YY
3 three ZZ
3 rows selected.
The query with the number of records :
select a, b, c, count(*) over (partition by 1) as count
from testTable
A B C COUNT
---------- ----- -- ----------
1 one XX 3
2 two YY 3
3 three ZZ 3
3 rows selected.

You can try to do something like:
WITH test_query
AS (SELECT LEVEL just_a_number
FROM dual
CONNECT BY level < 101)
SELECT just_a_number
, COUNT(just_a_number) OVER (ORDER BY 1) total_count
FROM test_query;
Where COUNT(just_a_number) OVER (ORDER BY 1) will return total number of rows fetched in each row. Note that it may slow down the query.

Typically, when I do something like this, I create a stored procedure that returns 2 values. The first would be the result set as a REF CURSOR, and the other a number(12,0) returning the count. Both would of course require separate queries, but since it is in a single stored procedure, it is only one database connection and command being executed.
You are of course right for having your COUNT query forgo the ORDER BY clause.
To answer your question, you are not being wasteful per se. This is common practice in enterprise software.

UNION between select with 0 rows and select with 20745 gives 20740

In SQL Server 2008 I have a behaviour I don't understand.
I'm doing a UNION between two select statements.
First select returns 20745 rows
Second select returns 0 rows
When I using union bewteen the two selects, I get 20740 rows, I would exspect 20745 as union only returns distinct values.
To get the excepted result I used union all but there is something I don't understand about it. Does anyone have an explanation?

There must be duplicate rows in your first SELECT statement. Note that UNION eliminates duplicates from your result set.
If you want to return all rows, use UNION ALL instead.
Example:
--UNION ALL
WITH TableA(n) AS (
SELECT * FROM (
VALUES(1),(2),(3),(4),(1)
)t(n)
),
TableB(n) AS (
SELECT * FROM (
VALUES(10),(20),(30),(40)
)t(n)
)
SELECT n FROM TableA UNION ALL
SELECT n FROM TableB
The above will return:
n
-----------
1
2
3
4
1
10
20
30
40
While the UNION variant
SELECT n FROM TableA UNION
SELECT n FROM TableB
will return:
n
-----------
1
2
3
4
10
20
30
40

union removes duplicate results, regardless of whether they come from two different selects or from the same one. If you want to preserve duplicated, use union all instead:
SELECT *
FROM table1
UNION ALL
SELECT *
FROM table2

First select statement has duplicates :) That's normal behavior.
Try putting a distinct in the first select statement - it should also return 20740 rows.
That should help you better understand what is happening.

Generating Random Number In Each Row In Oracle Query

I want to select all rows of a table followed by a random number between 1 to 9:
select t.*, (select dbms_random.value(1,9) num from dual) as RandomNumber
from myTable t
But the random number is the same from row to row, only different from each run of the query. How do I make the number different from row to row in the same execution?

Something like?
select t.*, round(dbms_random.value() * 8) + 1 from foo t;
Edit:
David has pointed out this gives uneven distribution for 1 and 9.
As he points out, the following gives a better distribution:
select t.*, floor(dbms_random.value(1, 10)) from foo t;

At first I thought that this would work:
select DBMS_Random.Value(1,9) output
from ...
However, this does not generate an even distribution of output values:
select output,
count(*)
from (
select round(dbms_random.value(1,9)) output
from dual
connect by level <= 1000000)
group by output
order by 1
1 62423
2 125302
3 125038
4 125207
5 124892
6 124235
7 124832
8 125514
9 62557
The reasons are pretty obvious I think.
I'd suggest using something like:
floor(dbms_random.value(1,10))
Hence:
select output,
count(*)
from (
select floor(dbms_random.value(1,10)) output
from dual
connect by level <= 1000000)
group by output
order by 1
1 111038
2 110912
3 111155
4 111125
5 111084
6 111328
7 110873
8 111532
9 110953

you don’t need a select … from dual, just write:
SELECT t.*, dbms_random.value(1,9) RandomNumber
FROM myTable t

If you just use round then the two end numbers (1 and 9) will occur less frequently, to get an even distribution of integers between 1 and 9 then:
SELECT MOD(Round(DBMS_RANDOM.Value(1, 99)), 9) + 1 FROM DUAL

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

generate_series() equivalent in DB2 - sql

I'm trying to search the DB2 equivalent of generate_series() (the PostgreSQL-way of generating rows). I obviously don't want to hard-code the rows with a VALUES statement. select * from generate_series(2,4); generate_series ----------------- 2 3 4 (3 rows)

I managed to write a recursive query that fits : with dummy(id) as ( select 2 from SYSIBM.SYSDUMMY1 union all select id + 1 from dummy where id + 1 between 2 and 4 ) select id from dummy The query can be adapted to whatever for(;;) you can dream of.

Related

How to limit number of groups returned in a query, but not the number of rows in Oracle

Rewrite a query with "LEVEL" in snowflake

Execute both the query and the COUNT of the query

UNION between select with 0 rows and select with 20745 gives 20740

Generating Random Number In Each Row In Oracle Query

Categories

Resources