Selecting a sequence in SQL - sql

There seems to be a few blog posts on this topic but the solutions really are not so intuitive. Surely there's a "Canonical" way?
I'm using Teradata SQL.
How would I select
A range of number
A date range
E.g.
SELECT 1:10 AS Nums
SELECT 1-1-2010:5-1-2014 AS Dates1
The result would be 10 rows (1 - 10) in the first SELECT query and ~(365 * 3.5) rows in the second?

The "canonical" way to do this in SQL is using recursive CTEs, which the more recent versions of Teradata support.
For your first example:
with recursive nums(n) as (
select 1 as n
union all
select n + 1
from nums
where n < 10
)
select *
from nums;
You can do something similar for dates.
EDIT:
You can also do this by using row_number() and an existing table:
with nums(n) as (
select n
from (select row_number() over (order by col) as n
from ExstingTable t
) t
where n <= 10
)
select *
from nums;
ExistingTable is just any table with enough rows. The best choice of col is the primary key.
with digits(n) as (
select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
select 6 union all select 7 union all select 8 union all select 9 union all select 10
)
select *
from digits;
If your version of Teradata supports multiple CTEs, you can build on the above:
with digits(n) as (
select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
select 6 union all select 7 union all select 8 union all select 9 union all select 10
),
nums(n) as (
select d1.n*100 + d2.n*10 + d3.n
from digits d1 cross join digits d2 cross join digits d3
)
select *
from nums;

In Teradata you can use the existing sys_calendar to get those dates:
SELECT calendar_date
FROM sys_calendar.CALENDAR
WHERE calendar_date BETWEEN DATE '2010-01-01' AND DATE '2014-05-01';
Note:
DATE '2010-01-01' is the only recommended way to write a date in Teradata
There's probably another custom calendar for the specific business needs of your company, too. Everyone will have access rights to it.
You might also use this for the range of numbers:
SELECT day_of_calendar
FROM sys_calendar.CALENDAR
WHERE day_of_calendar BETWEEN 1 AND 10;
But you should check Explain to see if the estimated number of rows is correct. sys_calendar is a kind of template and day_of_calendar is a calculated column, so no statistics exists on that and Explain will return an estimated number of 14683 (20 percent of the number of rows in that table) instead of 10. If you use it in additional joins the optimizer might do a bad plan based on that totally wrong number.
Note:
If you use sys_calendar you are limited to a maximum of 73414 rows, dates between 1900-01-01 and 2100-12-31 and numbers between 1 and 73414, your business calendar might vary.
Gordon Linoff's recursive query is not really efficient in Teradata, as it's a sequential row-by-row processing in a parallel database (each loop is an "all-AMPs step" in Explain) and the optimizer doesn't know how many rows will be returned.
If you need those ranges regularly you might consider creating a numbers table, I usually got one with a million rows or I use my calendar with the full range of 10000 years :-)
--DROP TABLE nums;
CREATE TABLE nums(n INT NOT NULL PRIMARY KEY CHECK (n BETWEEN 0 AND 999999));
INSERT INTO Nums
WITH cte(n) AS
(
SELECT day_of_calendar - 1
FROM sys_calendar.CALENDAR
WHERE day_of_calendar BETWEEN 1 AND 1000
)
SELECT
t1.n +
t2.n * 1000
FROM cte t1 CROSS JOIN cte t2;
COLLECT STATISTICS COLUMN(n) ON Nums;
The COLLECT STATS is the most important step to get correct estimates.
Now it's a simple
SELECT n FROM nums WHERE n BETWEEN 1 AND 10;
There's also a nice UDF on GitHub for creating sequences which is easy to use:
SELECT DATE '2010-01-01' + SEQUENCE
FROM TABLE(gen_sequence(0,DATE '2014-05-01' - DATE '2010-01-01')) AS t;
SELECT SEQUENCE
FROM TABLE(gen_sequence(1,10)) AS t;
But it's usually hard to convince your DBA to install any C-UDFs and the number of rows returned is unknown again.

sequence 1 to 10
sel sum (1) over (ROWS UNBOUNDED PRECEDING) as seq_val
from sys_calendar.CALENDAR
qualify row_number () over (order by 1)<=10

Related

Using INSERT with CTE

For a somewhat complex SQL script I need the following mapping:
WITH days_mapping AS (SELECT 1 AS day
UNION ALL
SELECT 2 AS day
UNION ALL
...
SELECT 31 AS day)
Is there any way to create the same mapping but without manually writing a SELECT and UNION ALL for every single number/day that should be in this mapping? I was thinking of doing an INSERT in a WHILE loop instead of the SELECT but I don't know how or if it is even possible to do that with common table expressions.
You can use a recursive CTE:
with days_mapping as (
select 1 as day
union all
select day + 1
from days_mapping
where day < 31
)
select *
from days_mapping;
Here is a db<>fiddle.
Note: If you have more than 100 rows being generating, you need to use option (maxrecursion 0) at the end of the query.

Find missing values in a sequence (sql)

Table1
Empid number
----------------
100 1
100 2
100 4
100 5
100 6
101 1
I'm self learning SQL, and a task I've come across is finding the missing values in sequence up to 12 and out putting which empid is associated.
I've attempted an approach that takes the above table and starts like
SELECT a number +1 , Min("through), MIn(by number) - 1
The entire approach use the existing numbers to find the missing "next/previous number. I'm able to output which numbers are missing. However I do not know how to group it with the associated id.
I also feel like I've complicated the task, I'm looking for guidance from anyone who can help on the best / most efficient way of going about this
Assuming that all empids and numbers are in the table somewhere, you can do this with a cross join and filter. In MS Access, this looks like:
select e.empid, n.number
from (select distinct empid from t) as e,
(select distinct number from t) as n
where not exists (select 1
from t
where t.empid = e.empid and t.number = n.number
);
This will not quite work for the data you have supplied. To handle that situation, you need a table that has the 12 numbers you are looking for.
Assumes you create a numbers table having Number column with 12 records value 1 to 12.
SELECT N.*, E.*
FROM NUMBERS N
CROSS JOIN (SELECT Distinct EmpID FROM table1) E
LEFT JOIN table1 T
on T.EmpID = E.EmpID
and T.Number = N.Number
WHERE T.EmpID is null
or substitute a derrived table for numbers table above
something like
(Select 1 as Number UNION ALL
Select 2 as Number UNION ALL
Select 3 as Number UNION ALL
Select 4 as Number UNION ALL
Select 5 as Number UNION ALL
Select 6 as Number UNION ALL
Select 7 as Number UNION ALL
Select 8 as Number UNION ALL
Select 9 as Number UNION ALL
Select 10 as Number UNION ALL
Select 11 as Number UNION ALL
Select 112 as Number)
I cant remember if MS Access will let you do this though...

How to find a number that isn't there?

We have a custom number field on a training record, the number is recorded sequentially but there are gaps. How do I find those gaps? Consider this pseudocode
SELECT MIN(X)
FROM DUAL
WHERE X BETWEEN 1 AND 999999
AND X NOT IN (SELECT AG_TRNID
FROM PS_TRAINING
WHERE AG_TRNID = X)
This doesn't work, "X" is unknown.
Thanks!
Bruce
This answer assumes that you're using Oracle.
The way to solve this is to create a resultset that contains all of the numbers in your range, then join to that. The way to do this is to use a recursive query:
SELECT LEVEL AS x
FROM DUAL
CONNECT BY LEVEL <= 999999
CONNECT BY is Oracle-specific syntax that tells the query to run recursively as long as the predicate is true. level is a pseudo-column that only exists in queries that use CONNECT BY that indicates the level of recursion. The end result is that this query will run the query against dual 999,999 times, each time being a level deeper in the recursion.
Given this method of generating numbers, plugging it into the query you tried earlier is pretty trivial:
SELECT MIN (x)
FROM (SELECT LEVEL AS x
FROM DUAL
CONNECT BY LEVEL <= 999999)
WHERE x NOT IN (SELECT ag_trnid
FROM ps_training
WHERE ag_trnid = x)
Two quick examples. The first is your classic Gaps-and-Islands, and the second will use an ad-hoc tally table to identify the missings elements via a LEFT JOIN.
The following was created in SQL Server, but if your database supports the window functions, it should be a small task to adapt.
Gaps and Islands
Declare #YourTable table (X int)
Insert Into #YourTable values
(1),(2),(3),(5),(6),(10)
Select X1 = min(X)
,X2 = max(X)
From (
Select *
,Grp = X - Row_Number() over (Order by X)
From #YourTable
) A
Group By Grp
Returns
X1 X2
1 3
5 6
10 10
Ad-hoc Tally Table
Declare #YourTable table (X int)
Insert Into #YourTable values
(1),(2),(3),(5),(6),(10)
;with cte0(N) As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
cteN(N) As (Select Row_Number() over (Order By (Select NULL)) From cte0 N1, cte0 N2, cte0 N3, cte0 N4, cte0 N5, cte0 N6) -- 1 Million
Select N
From cteN
Left Join #YourTable on X=N
Where X is Null
and N<=(Select max(X) from #YourTable)
Order By N
Returns
N
4
7
8
9
Create all numbers from min(ag_trnid) to max(ag_trnid). From these remove the existing numbers:
with nums(num, maxnum) as
(
select min(ag_trnid) as num, max(ag_trnid) as maxnum from ps_training
union all
select num + 1, maxnum from nums where num < maxnum
)
select num from nums
minus
select ag_trnid from ps_training;
Start with 1 (or 0 for that matter) instead of min(ag_trnid), if you consider numbers before the minimum ag_trnid gaps, too.
I think there's no other way besides create a table's id to find the gaps
declare #idMin bigint
declare #idMax bigint
set #idMin = (select min(AG_TRNID) from PS_TRAINING)
set #idMax = (select max(AG_TRNID) from PS_TRAINING)
create table numbers
(id bigint)
while (#idMax>#idMin)
begin
insert into numbers values(#idMin)
set #idMin=#idMin+1
end
select id from numbers
where id not in (SELECT AG_TRNID FROM PS_TRAINING)
You can create a "virtual" table that contains all numbers from 1 to 999999 then remove those that are present in your table:
select level
from dual
connect by level <= 999999
minus
select ag_trnid
from ps_training;
The above will output all gaps, not just the first one.
The connect by level <= 999999 is an undocumented trick to generate all numbers from 1 to 999999.
Here's one way of finding the start and end numbers of the gaps:
WITH sample_data AS (SELECT 1 ID FROM dual UNION ALL
SELECT 2 ID FROM dual UNION ALL
SELECT 4 ID FROM dual UNION ALL
SELECT 5 ID FROM dual UNION ALL
SELECT 8 ID FROM dual UNION ALL
SELECT 10 ID FROM dual UNION ALL
SELECT 20 ID FROM dual UNION ALL
SELECT 22 ID FROM dual UNION ALL
SELECT 23 ID FROM dual UNION ALL
SELECT 27 ID FROM dual UNION ALL
SELECT 28 ID FROM dual)
-- end of mimicking data in a table called "sample_data"
-- see SQL below:
SELECT prev_id + 1 first_number_in_gap,
ID - 1 last_number_in_gap
FROM (SELECT ID,
LAG(ID, 1, ID - 1) OVER (ORDER BY ID) prev_id
FROM sample_data)
WHERE ID - prev_id > 1;
FIRST_NUMBER_IN_GAP LAST_NUMBER_IN_GAP
------------------- ------------------
3 3
6 7
9 9
11 19
21 21
24 26
In the end I created a table with all possible values and select the minimum from it that doesn't exist in the base table. It works well. I had hoped for a sql statement to avoid creating yet another object but couldn't.
Yes this is Oracle, I neglected to mention.
Thank you all for the suggestions and assistance, it is much appreciated!

Inserting rows where column can have many values

I am writing a stored proc that inserts rows into a table. The issue is that many of the columns can have a list of different values and all of the rows in the db need to reflect these values. For example:
I have a table: Table1(state, number)
state will need to be 1-50 as its value and number is 1-3. There needs to be a row for each state with each number.
(1,1)
(1,2)
(1,3)
(2,1)...etc
There has got to be a nice way to do this but my research has not been fruitful. Does anyone have any suggestions?
A good way to generate the values is using a cross join. Here is an example:
insert into table(state, number)
select s.state, n.number
from (select 'AK' as state union all select 'AL' union all . . .
) s cross join
(select 1 as number union all select 2 union all select 3
) n
You may already have a lists of states and/or numbers, in which case you can use this. For example:
insert into table(state, number)
select s.state, n.number
from (select state from states
) s cross join
(select 1 as number union all select 2 union all select 3
) n
Your need is a cross join between two tables, one containing 50 rows, the other 3 rows.
In Oracle:
select *
from
(
select rownum as state
from dual
connect by rownum <= 50
) t1
,
(
select rownum as num
from dual
connect by rownum <= 3
) t2
Fiddle

MySQL RAND() 7 LIMIT

My database table has 15 records and I want to show 9 at random on screen.
SELECT * FROM tablename ORDER BY RAND() LIMIT 9
This works as expected, but what if the table only has 9 records? I need to pull 15 random records.
I understand this will duplicate one or more records, but that's my intention.
Your select will only pull the number of records in the table regardless of the order by. You can use various means to duplicate the table data, however, before you order them. For example, union all of the rows together twice:
select * from
(
select * from tablename
union all
select * from tablename
) as tmp
order by rand() limit 9
RAND() itself is not efficient when you deal with a large database.
A better way of doing such query is to:
-1. Query the largest id (assume id is the unique key)
-2. use javascript of php function to generate 15 random numbers from 1 to max_id, push to -array
-3. Implode the array (e.g. $id_list = "'".implode("', '", $id_list)."'")
-4. Select * from tablename where id in ($id_list)
This will work even if you have only 1 row in the table. If you have less than 15 (say 11) rows, you'll have all 11 in the result plus 4 more random ones:
SELECT col1, col2, ..., colN -- the columns of `tablename` only
FROM
( SELECT a.i, b.j, t.*
FROM
( SELECT *, RAND() AS rnd
FROM tablename
ORDER BY rnd LIMIT 15
) AS t
CROSS JOIN
( SELECT 1 AS i UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 )
AS a
CROSS JOIN
( SELECT 1 AS j UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 )
AS b
ORDER BY i, j, rnd
LIMIT 15
) AS t15
ORDER BY RAND() ;
If you want "more" randomness, having duplicate or triplicate rows in the results with possibly some rows not shown at all, replace the last five lines with:
AS b
ORDER BY RAND()
LIMIT 15
) AS t15 ;