SQL: create sequential list of numbers from various starting points - sql

I'm stuck on this SQL problem.
I have a column that is a list of starting points (prevdoc), and anther column that lists how many sequential numbers I need after the starting point (exdiff).
For example, here are the first several rows:
prevdoc | exdiff
----------------
1 | 3
21 | 2
126 | 2
So I need an output to look something like:
2
3
4
22
23
127
128
I'm lost as to where even to start. Can anyone advise me on the SQL code for this solution?
Thanks!

;with a as
(
select prevdoc + 1 col, exdiff
from <table> where exdiff > 0
union all
select col + 1, exdiff - 1
from a
where exdiff > 1
)
select col

If your exdiff is going to be a small number, you can make up a virtual table of numbers using SELECT..UNION ALL as shown here and join to it:
select prevdoc+number
from doc
join (select 1 number union all
select 2 union all
select 3 union all
select 4 union all
select 5) x on x.number <= doc.exdiff
order by 1;
I have provided for 5 but you can expand as required. You haven't specified your DBMS, but in each one there will be a source of sequential numbers, for example in SQL Server, you could use:
select prevdoc+number
from doc
join master..spt_values v on
v.number <= doc.exdiff and
v.number >= 1 and
v.type = 'p'
order by 1;
The master..spt_values table contains numbers between 0-2047 (when filtered by type='p').

If the numbers are not too large, then you can use the following trick in most databases:
select t.exdiff + seqnum
from t join
(select row_number() over (order by column_name) as seqnum
from INFORMATION_SCHEMA.columns
) nums
on t.exdiff <= seqnum
The use of INFORMATION_SCHEMA columns in the subquery is arbitrary. The only purpose is to generate a sequence of numbers at least as long as the maximum exdiff number.
This approach will work in any database that supports the ranking functions. Most databases have a database-specific way of generating a sequence (such as recursie CTEs in SQL Server and CONNECT BY in Oracle).

Related

Redshift: Generate a sequential range of numbers

I'm currently migrating PostgreSQL code from our existing DWH to new Redshift DWH and few queries are not compatible.
I have a table which has id, start_week, end_week and orders_each_week in a single row. I'm trying to generate a sequential series between the start_week and end_week so that I separate rows for each week between the give timeline.
Eg.,
This how it is present in the table
+----+------------+----------+------------------+
| ID | start_week | end_week | orders_each_week |
+----+------------+----------+------------------+
| 1 | 3 | 5 | 10 |
+----+------------+----------+------------------+
This is how I want to have it
+----+------+--------+
| ID | week | orders |
+----+------+--------+
| 1 | 3 | 10 |
+----+------+--------+
| 1 | 4 | 10 |
+----+------+--------+
| 1 | 5 | 10 |
+----+------+--------+
The code below is throwing error.
SELECT
id,
generate_series(start_week::BIGINT, end_week::BIGINT) AS demand_weeks
FROM client_demand
WHERE createddate::DATE >= '2021-01-01'
[0A000][500310] Amazon Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.;
[01000] Function "generate_series(bigint,bigint)" not supported.
So basically I am trying to find a sequential series between two numbers and I couldn't find any solution and any help here is really appreciated. Thank you
Gordon Linoff has shown a very common method for doing this and this approach has the advantage that the process isn't generating "rows" that don't already exist. This can make this faster than approaches that generate data on the fly. However, you need to have a table with about the right number of rows laying around and this isn't always the case. He also shows that this number series needs to be cross joined with your data to perform the function you need.
If you need to generate a large number of numbers in a series not using an existing table there are a number of ways to do this. Here's my go to approach:
WITH twofivesix AS (
SELECT
p0.n
+ p1.n * 2
+ p2.n * POWER(2,2)
+ p3.n * POWER(2,3)
+ p4.n * POWER(2,4)
+ p5.n * POWER(2,5)
+ p6.n * POWER(2,6)
+ p7.n * POWER(2,7)
as n
FROM
(SELECT 0 as n UNION SELECT 1) p0,
(SELECT 0 as n UNION SELECT 1) p1,
(SELECT 0 as n UNION SELECT 1) p2,
(SELECT 0 as n UNION SELECT 1) p3,
(SELECT 0 as n UNION SELECT 1) p4,
(SELECT 0 as n UNION SELECT 1) p5,
(SELECT 0 as n UNION SELECT 1) p6,
(SELECT 0 as n UNION SELECT 1) p7
),
fourbillion AS (
SELECT (a.n * POWER(256, 3) + b.n * POWER(256, 2) + c.n * 256 + d.n) as n
FROM twofivesix a,
twofivesix b,
twofivesix c,
twofivesix d
)
SELECT ...
This example makes a whole bunch of numbers (4B) but you can extend or reduce the number in the series by changing the number of times the tables are cross joined and by adding where clauses (as Gordon Linoff did). I don't expect you need a list anywhere close to this long but wanted to show how this can be used to make series that are very long. (You can also write with in base 10 if that makes more sense to you.)
So if you have a table with a more rows that you need number then this can be the fastest method but if you don't have such a table or table lengths vary over time you may want this pure SQL approach.
Among the many Postgres features that Redshift does not support is generate_series() (except on the master node). You can generate one yourself.
If you have a table with enough rows in Redshift, then I find that this approach works:
with n as (
select row_number() over () - 1 as n
from client_demand cd
)
select cd.id, cd.start_week + n.n as week, cd.orders_each_week
from client_demand cd join
n
on n.n <= (end_week - start_week);
This assumes that you have a table with enough rows to generate enough numbers for the on clause. If the table is really big, then add something like limit 100 in the n CTE to limit the size.
If there are only a handful of values, you can use:
select 0 as n union all
select 1 as n union all
select 2 as n

Split a row into multiple rows based on a column value SQL

I have the following "Order Table" :
Item Quantity
pencil 2
pen 1
Notebook 4
I need the result like :
Item Quantity
pencil 1
pencil 1
pen 1
Notebook 1
Notebook 1
Notebook 1
Notebook 1
You didn't specify which RDBMS you are using, so how you generate the numbers will depend on that (maybe a recursive CTE for SQL Server, using DUAL for Oracle, etc.). I've only written code to handle the data that you've shown, but you'll obviously need to account for numbers larger than four in the final solution.
SELECT
MT.sr_no,
MT.item_name,
1 AS quantity
FROM
My_Table MT
INNER JOIN
(
SELECT 1 AS nbr UNION ALL SELECT 2 AS nbr UNION ALL
SELECT 3 AS nbr UNION ALL SELECT 4 AS nbr
) N ON N.nbr <= MT.quantity
You can use the recursive query using common table expression to generate number duplicate rows according to the quantity field as below
WITH cte (sno,item,quantity,rnum)
AS
(
SELECT sno,item,quantity, 1 as rnum
FROM [Order]
UNION ALL
SELECT cte.sno,cte.item,cte.quantity, rnum+1
FROM [Order] JOIN cte ON [Order].sno = cte.sno
AND cte.rnum < [Order].quantity
)
SELECT item,1 AS Quantity
FROM cte
ORDER BY sno

Generate SQL rows

Given a number of types and a number of occurrences per type, I would like to generate something like this in T-SQL:
Occurrence | Type
-----------------
0 | A
1 | A
0 | B
1 | B
2 | B
Both the number of types and the number of occurrences per type are presented as values in different tables.
While I can do this with WHILE loops, I'm looking for a better solution.
Thanks!
This works with a number-table which i would use.
SELECT Occurrence = ROW_NUMBER() OVER (PARTITION BY Type ORDER BY Type) - 1
, Type
FROM Numbers num
INNER JOIN #temp1 t
ON num.n BETWEEN 1 AND t.Occurrence
Tested with this sample data:
create table #temp1(Type varchar(10),Occurrence int)
insert into #temp1 VALUES('A',2)
insert into #temp1 VALUES('B',3)
How to create a number-table? http://sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1
If you have a table with the columns type and num, you have two approaches. One way is to use recursive CTEs:
with CTE as (
select type, 0 as occurrence, num
from table t
union all
select type, 1 + occurrence, num
from cte
where occurrence + 1 < num
)
select cte.*
from cte;
You may have to set the MAXRECURSION option, if the number exceeds 100.
The other way is to join in a numbers table. SQL Server uses spt_values for this purpose:
select s.number - 1 as occurrence, t.type
from table t join
spt_values s
on s.number <= t.num ;

How to check any missing number from a series of numbers?

I am doing a project creating an admission system for a college; the technologies are Java and Oracle.
In one of the tables, pre-generated serial numbers are stored. Later, against those serial numbers, the applicant's form data will be entered. My requirement is that when the entry process is completed I will have to generate a Lot wise report. If during feeding pre-generated serial numbers any sequence numbers went missing.
For example, say in a table, the sequence numbers are 7001, 7002, 7004, 7005, 7006, 7010.
From the above series it is clear that from 7001 to 7010 the numbers missing are 7003, 7007, 7008 and 7009
Is there any DBMS function available in Oracle to find out these numbers or if any stored procedure may fulfill my purpose then please suggest an algorithm.
I can find some techniques in Java but for speed I want to find the solution in Oracle.
A solution without hardcoding the 9:
select min_a - 1 + level
from ( select min(a) min_a
, max(a) max_a
from test1
)
connect by level <= max_a - min_a + 1
minus
select a
from test1
Results:
MIN_A-1+LEVEL
-------------
7003
7007
7008
7009
4 rows selected.
Try this:
SELECT t1.SequenceNumber + 1 AS "From",
MIN(t2.SequenceNumber) - 1 AS "To"
FROM MyTable t1
JOIN MyTable t2 ON t1.SequenceNumber < t2.SequenceNumber
GROUP BY t1.SequenceNumber
HAVING t1.SequenceNumber + 1 < MIN(t2.SequenceNumber)
Here is the result for the sequence 7001, 7002, 7004, 7005, 7006, 7010:
From To
7003 7003
7007 7009
This worked but selects the first sequence (start value) since it doesn't have predecessor. Tested in SQL Server but should work in Oracle
SELECT
s.sequence FROM seqs s
WHERE
s.sequence - (SELECT sequence FROM seqs WHERE sequence = s.sequence-1) IS NULL
Here is a test result
Table
-------------
7000
7001
7004
7005
7007
7008
Result
----------
7000
7004
7007
To get unassigned sequence, just do value[i] - 1 where i is greater first row e.g. (7004 - 1 = 7003 and 7007 - 1 = 7006) which are available sequences
I think you can improve on this simple query
This works on postgres >= 8.4. With some slight modifications to the CTE-syntax it could be made to work for oracle and microsoft, too.
-- EXPLAIN ANALYZE
WITH missing AS (
WITH RECURSIVE fullhouse AS (
SELECT MIN(num)+1 as num
FROM numbers n0
UNION ALL SELECT 1+ fh0.num AS num
FROM fullhouse fh0
WHERE EXISTS (
SELECT * FROM numbers ex
WHERE ex.num > fh0.num
)
)
SELECT * FROM fullhouse fh1
EXCEPT ( SELECT num FROM numbers nx)
)
SELECT * FROM missing;
Here's a solution that:
Relies on Oracle's LAG function
Does not require knowledge of the complete sequence (but thus doesn't detect if very first or last numbers in sequence were missed)
Lists the values surrounding the missing lists of numbers
Lists the missing lists of numbers as contiguous groups (perhaps convenient for reporting)
Tragically fails for very large lists of missing numbers, due to listagg limitations
SQL:
WITH MentionedValues /*this would just be your actual table, only defined here to provide data for this example */
AS (SELECT *
FROM ( SELECT LEVEL + 7000 seqnum
FROM DUAL
CONNECT BY LEVEL <= 10000)
WHERE seqnum NOT IN (7003,7007,7008,7009)--omit those four per example
),
Ranges /*identifies all ranges between adjacent rows*/
AS (SELECT seqnum AS seqnum_curr,
LAG (seqnum, 1) OVER (ORDER BY seqnum) AS seqnum_prev,
seqnum - (LAG (seqnum, 1) OVER (ORDER BY seqnum)) AS diff
FROM MentionedValues)
SELECT Ranges.*,
( SELECT LISTAGG (Ranges.seqnum_prev + LEVEL, ',') WITHIN GROUP (ORDER BY 1)
FROM DUAL
CONNECT BY LEVEL < Ranges.diff) "MissingValues" /*count from lower seqnum+1 up to lower_seqnum+(diff-1)*/
FROM Ranges
WHERE diff != 1 /*ignore when diff=1 because that means the numers are sequential without skipping any*/
;
Output:
SEQNUM_CURR SEQNUM_PREV DIFF MissingValues
7004 7002 2 "7003"
7010 7006 4 "7007,7008,7009"
One simple way to get your answer for your scenario is this:
create table test1 ( a number(9,0));
insert into test1 values (7001);
insert into test1 values (7002);
insert into test1 values (7004);
insert into test1 values (7005);
insert into test1 values (7006);
insert into test1 values (7010);
commit;
select n.n from (select ROWNUM + 7001 as n from dual connect by level <= 9) n
left join test1 t on n.n = t.a where t.a is null;
The select will give you the answer from your example. This only makes sense, if you know in advance in which range your numbers are and the range should not too big. The first number must be the offset in the ROWNUM part and the length of the sequence is the limit to the level in the connect by part.
I would have suggested connect by level as Stefan has done, however, you can't use a sub-query in this statement, which means that it isn't really suitable for you as you need to know what the maximum and minimum values of your sequence are.
I would suggest a pipe-lined table function might be the best way to generate the numbers you need to do the join. In order for this to work you'd need an object in your database to return the values to:
create or replace type t_num_array as table of number;
Then the function:
create or replace function generate_serial_nos return t_num_array pipelined is
l_first number;
l_last number;
begin
select min(serial_no), max_serial_no)
into l_first, l_last
from my_table
;
for i in l_first .. l_last loop
pipe row(i);
end loop;
return;
end generate_serial_nos;
/
Using this function the following would return a list of serial numbers, between the minimum and maximum.
select * from table(generate_serial_nos);
Which means that your query to find out which serial numbers are missing becomes:
select serial_no
from ( select *
from table(generate_serial_nos)
) generator
left outer join my_table actual
on generator.column_value = actual.serial_no
where actual.serial_no is null
SELECT ROWNUM "Missing_Numbers" FROM dual CONNECT BY LEVEL <= (SELECT MAX(a) FROM test1)
MINUS
SELECT a FROM test1 ;
Improved query is:
SELECT ROWNUM "Missing_Numbers" FROM dual CONNECT BY LEVEL <= (SELECT MAX(a) FROM test1)
MINUS
SELECT ROWNUM "Missing_Numbers" FROM dual CONNECT BY LEVEL < (SELECT Min(a) FROM test1)
MINUS
SELECT a FROM test1;
Note: a is column in which we find missing value.
Try with a subquery:
SELECT A.EMPNO + 1 AS MissingEmpNo
FROM tblEmpMaster AS A
WHERE A.EMPNO + 1 NOT IN (SELECT EMPNO FROM tblEmpMaster)
select A.ID + 1 As ID
From [Missing] As A
Where A.ID + 1 Not IN (Select ID from [Missing])
And A.ID < n
Data: ID
1
2
5
7
Result: ID
3
4
6

Generating Random Number In Each Row In Oracle Query

I want to select all rows of a table followed by a random number between 1 to 9:
select t.*, (select dbms_random.value(1,9) num from dual) as RandomNumber
from myTable t
But the random number is the same from row to row, only different from each run of the query. How do I make the number different from row to row in the same execution?
Something like?
select t.*, round(dbms_random.value() * 8) + 1 from foo t;
Edit:
David has pointed out this gives uneven distribution for 1 and 9.
As he points out, the following gives a better distribution:
select t.*, floor(dbms_random.value(1, 10)) from foo t;
At first I thought that this would work:
select DBMS_Random.Value(1,9) output
from ...
However, this does not generate an even distribution of output values:
select output,
count(*)
from (
select round(dbms_random.value(1,9)) output
from dual
connect by level <= 1000000)
group by output
order by 1
1 62423
2 125302
3 125038
4 125207
5 124892
6 124235
7 124832
8 125514
9 62557
The reasons are pretty obvious I think.
I'd suggest using something like:
floor(dbms_random.value(1,10))
Hence:
select output,
count(*)
from (
select floor(dbms_random.value(1,10)) output
from dual
connect by level <= 1000000)
group by output
order by 1
1 111038
2 110912
3 111155
4 111125
5 111084
6 111328
7 110873
8 111532
9 110953
you don’t need a select … from dual, just write:
SELECT t.*, dbms_random.value(1,9) RandomNumber
FROM myTable t
If you just use round then the two end numbers (1 and 9) will occur less frequently, to get an even distribution of integers between 1 and 9 then:
SELECT MOD(Round(DBMS_RANDOM.Value(1, 99)), 9) + 1 FROM DUAL