Redshift: Generate a sequential range of numbers - sql

I'm currently migrating PostgreSQL code from our existing DWH to new Redshift DWH and few queries are not compatible.
I have a table which has id, start_week, end_week and orders_each_week in a single row. I'm trying to generate a sequential series between the start_week and end_week so that I separate rows for each week between the give timeline.
Eg.,
This how it is present in the table
+----+------------+----------+------------------+
| ID | start_week | end_week | orders_each_week |
+----+------------+----------+------------------+
| 1 | 3 | 5 | 10 |
+----+------------+----------+------------------+
This is how I want to have it
+----+------+--------+
| ID | week | orders |
+----+------+--------+
| 1 | 3 | 10 |
+----+------+--------+
| 1 | 4 | 10 |
+----+------+--------+
| 1 | 5 | 10 |
+----+------+--------+
The code below is throwing error.
SELECT
id,
generate_series(start_week::BIGINT, end_week::BIGINT) AS demand_weeks
FROM client_demand
WHERE createddate::DATE >= '2021-01-01'
[0A000][500310] Amazon Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.;
[01000] Function "generate_series(bigint,bigint)" not supported.
So basically I am trying to find a sequential series between two numbers and I couldn't find any solution and any help here is really appreciated. Thank you

Gordon Linoff has shown a very common method for doing this and this approach has the advantage that the process isn't generating "rows" that don't already exist. This can make this faster than approaches that generate data on the fly. However, you need to have a table with about the right number of rows laying around and this isn't always the case. He also shows that this number series needs to be cross joined with your data to perform the function you need.
If you need to generate a large number of numbers in a series not using an existing table there are a number of ways to do this. Here's my go to approach:
WITH twofivesix AS (
SELECT
p0.n
+ p1.n * 2
+ p2.n * POWER(2,2)
+ p3.n * POWER(2,3)
+ p4.n * POWER(2,4)
+ p5.n * POWER(2,5)
+ p6.n * POWER(2,6)
+ p7.n * POWER(2,7)
as n
FROM
(SELECT 0 as n UNION SELECT 1) p0,
(SELECT 0 as n UNION SELECT 1) p1,
(SELECT 0 as n UNION SELECT 1) p2,
(SELECT 0 as n UNION SELECT 1) p3,
(SELECT 0 as n UNION SELECT 1) p4,
(SELECT 0 as n UNION SELECT 1) p5,
(SELECT 0 as n UNION SELECT 1) p6,
(SELECT 0 as n UNION SELECT 1) p7
),
fourbillion AS (
SELECT (a.n * POWER(256, 3) + b.n * POWER(256, 2) + c.n * 256 + d.n) as n
FROM twofivesix a,
twofivesix b,
twofivesix c,
twofivesix d
)
SELECT ...
This example makes a whole bunch of numbers (4B) but you can extend or reduce the number in the series by changing the number of times the tables are cross joined and by adding where clauses (as Gordon Linoff did). I don't expect you need a list anywhere close to this long but wanted to show how this can be used to make series that are very long. (You can also write with in base 10 if that makes more sense to you.)
So if you have a table with a more rows that you need number then this can be the fastest method but if you don't have such a table or table lengths vary over time you may want this pure SQL approach.

Among the many Postgres features that Redshift does not support is generate_series() (except on the master node). You can generate one yourself.
If you have a table with enough rows in Redshift, then I find that this approach works:
with n as (
select row_number() over () - 1 as n
from client_demand cd
)
select cd.id, cd.start_week + n.n as week, cd.orders_each_week
from client_demand cd join
n
on n.n <= (end_week - start_week);
This assumes that you have a table with enough rows to generate enough numbers for the on clause. If the table is really big, then add something like limit 100 in the n CTE to limit the size.
If there are only a handful of values, you can use:
select 0 as n union all
select 1 as n union all
select 2 as n

Related

SQL Connect By Level sometimes works, sometimes doesn't can't understand why

I am trying to run the query in Oracle, and if I change the round to 0, I get a result, but anytime there are decimals I am not getting a result back when using the connect by level part. But if I run I my query from after n.n= I get the result.
Reason I am trying to use the connect by level is I have a requirement to put my entire query into the where clause as in the application there is a restriction to do the group by clause I need.
SELECT n.n
FROM
(SELECT TO_NUMBER( LEVEL) - 1 n FROM DUAL CONNECT BY LEVEL <= 1000 ) n
WHERE n.n =
(subquery)
Examples of values I have which work in HOURS seem to be like whole number, wo when these are summed they are still whole numbers
5
10
5
5
20
But where I have seen the query not work is where I have decimal values such as:
3.68
2.45
5
10
5
Table:ASSIGNMENTS_M
Table: RESULT_VALUES
Columns: Result_ID, Assignment_ID, Date_Earned, Hours
INSERT INTO RESULT_VALUES(Result_ID, Assignment_ID, Date_Earned, Hours) VALUES(50,123456,to_date('01/02/2020', 'DD/MM/YYYY'),3.68 51,230034,to_date('02/02/2020', 'DD/MM/YYYY'),5 52,123456,to_date('03/02/2020', 'DD/MM/YYYY'),10 53,123456,to_date('04/02/2020', 'DD/MM/YYYY'),5 60,123456,to_date('05/02/2020', 'DD/MM/YYYY'),5 90,123456,to_date('06/02/2020', 'DD/MM/YYYY'),5 2384,123456,to_date('07/02/2020', 'DD/MM/YYYY'),10);
Expected Result = 38.68
Here's one solution, even though it's odd you want to do this:
The adjusted fiddle:
Working test case
This increments by 0.1 to find the matching row:
SELECT n.n
FROM ( SELECT TO_NUMBER(LEVEL)/10 - 1 n FROM DUAL CONNECT BY LEVEL <= 1000 ) n
WHERE n.n = (
SELECT round((sum(P2.HOURS)),1) FTE
FROM ASSIGNMENTS_M P1, RESULT_HOURS P2
WHERE P2.date_earned BETWEEN to_date('2020/01/01','YYYY/MM/DD') AND to_date('2020/10/31','YYYY/MM/DD')
AND P1.ASSIGNMENT_ID = 123456
GROUP BY P1.ASSIGNMENT_ID
)
;
This increments by 1 to find the matching row, but adjusts the calculation to allow this:
SELECT n.n / 10
FROM ( SELECT TO_NUMBER(LEVEL) - 1 n FROM DUAL CONNECT BY LEVEL <= 1000 ) n
WHERE n.n = (
SELECT round((sum(P2.HOURS)),1) FTE
FROM ASSIGNMENTS_M P1, RESULT_HOURS P2
WHERE P2.date_earned BETWEEN to_date('2020/01/01','YYYY/MM/DD') AND to_date('2020/10/31','YYYY/MM/DD')
AND P1.ASSIGNMENT_ID = 123456
GROUP BY P1.ASSIGNMENT_ID
) * 10
;
The result:
None of your results match the number sequence generated by the n derived table:
SELECT p1.assignment_id, round((sum(P2.HOURS)),1) FTE
FROM ASSIGNMENTS_M P1, RESULT_HOURS P2
WHERE P2.date_earned BETWEEN to_date('2020/01/01','YYYY/MM/DD') AND to_date('2020/10/31','YYYY/MM/DD')
AND P1.ASSIGNMENT_ID = 123456
GROUP BY P1.ASSIGNMENT_ID
;
Result:
+---------------=+
| id | fte |
+----------------+
| 123456 | 43.7 |
+----------------+
That's the reason. Now how do you want to change this logic?
Do you want an approximate comparison or do you want your sequence to be in 0.1 increments?

SQL Server - Manipulate data and return new results

Is there any way to do something in SQL Server to manipulate data like manipulating arrays in any other programming language?
I have one SQL query that returns 3 columns, "dt_ref" (date), "vlr_venda" (float) and "qt_parcelas" (int)
Basically, I need to do something like this:
- When field "qt_parcelas" is higher than 1, I need to do a "loop" with this row and generate 3 rows.
So, I need to divide the field "vlr_venda" by field "qt_parcelas", and use the field "dt_ref" as reference for the date start and increment month in the date field for the value of "qt_parcelas"
For example, if my query returns these structure:
| dt_ref | vlr_venda | qt_parcelas |
-------------------------------------
|20180901 | 3000 | 3 |
I need to do something to return this:
| dt_ref | vlr_venda |
----------------------
|20180901 | 1000 |
|20181001 | 1000 |
|20181101 | 1000 |
Is it possible to do it in SQL Server?
I've searched for something like this but haven't found anything useful...
Any ideas?
You can use a recursive CTE: Sql Fiddle
with cte as (
select dt_ref, vlr_venda / qt_parcelas as new_val, qt_parcelas, 1 as num
from t
union all
select dateadd(month, 1, dt_ref), new_val, qt_parcelas, num + 1
from cte
where num < qt_parcelas
)
select dt_ref, new_val
from cte;
As written, this will work for up to 100 months. You need to add OPTION (MAXRECURSION 0) for longer periods.
Instead of using a rCTE you could use a tally table. If you have much larger numbers than 3 this'll probably be far be efficient:
WITH N AS(
SELECT n
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N(n)),
Tally AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) I
FROM N N1 --10 rows
CROSS JOIN N N2 --100 rows
--CROSS JOIN N N3 --Keep adding more CROSS JOINs to create more rows
),
VTE AS (
SELECT CONVERT(date,V.dt_ref) AS dt_ref,
V.vlr_venda,
V.qt_parcelas
FROM (VALUES('20180901',3000,3),
('20181001',12000,6)) V(dt_ref,vlr_venda ,qt_parcelas))
SELECT DATEADD(MONTH,T.I,V.dt_ref),
CONVERT(decimal(10,4),V.vlr_venda / (V.qt_parcelas * 1.0)) --incase you need decimal points
FROM VTE V
JOIN Tally T ON V.qt_parcelas >= T.I;
I developed a software to generate tickets and i had a similar experience than you. I have tried CURSORS and recursive CTE and they all took something like 50 minutes when creating tickets for clients
I used this function to replicate my clients and generate my tickets
/****** Object: UserDefinedFunction [dbo].[NumbersTable] Script Date: 28/09/2018 10:51:25 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[NumbersTable] (
#fromNumber int,
#toNumber int,
#byStep int
)
RETURNS #NumbersTable TABLE (i int)
AS
BEGIN
WITH CTE_NumbersTable AS (
SELECT #fromNumber AS i
UNION ALL
SELECT i + #byStep
FROM CTE_NumbersTable
WHERE
(i + #byStep) <= #toNumber
)
INSERT INTO #NumbersTable
SELECT i FROM CTE_NumbersTable OPTION (MAXRECURSION 0)
RETURN;
END
GO
Then you can use a
CROSS APPLY dbo.NumbersTable(1,qt_parcelas ,1);
To generate your rows
Believe me this way is more efficient and when dealing with large amount of data (something like 8 to 10 million rows) it takes something like 2 minutes instead of 40

SQL: create sequential list of numbers from various starting points

I'm stuck on this SQL problem.
I have a column that is a list of starting points (prevdoc), and anther column that lists how many sequential numbers I need after the starting point (exdiff).
For example, here are the first several rows:
prevdoc | exdiff
----------------
1 | 3
21 | 2
126 | 2
So I need an output to look something like:
2
3
4
22
23
127
128
I'm lost as to where even to start. Can anyone advise me on the SQL code for this solution?
Thanks!
;with a as
(
select prevdoc + 1 col, exdiff
from <table> where exdiff > 0
union all
select col + 1, exdiff - 1
from a
where exdiff > 1
)
select col
If your exdiff is going to be a small number, you can make up a virtual table of numbers using SELECT..UNION ALL as shown here and join to it:
select prevdoc+number
from doc
join (select 1 number union all
select 2 union all
select 3 union all
select 4 union all
select 5) x on x.number <= doc.exdiff
order by 1;
I have provided for 5 but you can expand as required. You haven't specified your DBMS, but in each one there will be a source of sequential numbers, for example in SQL Server, you could use:
select prevdoc+number
from doc
join master..spt_values v on
v.number <= doc.exdiff and
v.number >= 1 and
v.type = 'p'
order by 1;
The master..spt_values table contains numbers between 0-2047 (when filtered by type='p').
If the numbers are not too large, then you can use the following trick in most databases:
select t.exdiff + seqnum
from t join
(select row_number() over (order by column_name) as seqnum
from INFORMATION_SCHEMA.columns
) nums
on t.exdiff <= seqnum
The use of INFORMATION_SCHEMA columns in the subquery is arbitrary. The only purpose is to generate a sequence of numbers at least as long as the maximum exdiff number.
This approach will work in any database that supports the ranking functions. Most databases have a database-specific way of generating a sequence (such as recursie CTEs in SQL Server and CONNECT BY in Oracle).

What is a SQL frequency-distribution query to count ranges with group-by, and include 0 counts?

Given:
table 'thing':
age
---
3.4
3.4
10.1
40
45
49
I want to count the number of things for each 10-year range, e.g.,
age_range | count
----------+-------
0 | 2
10| 1
20| 0
30| 0
40| 3
This query comes close:
SELECT FLOOR(age / 10) as age_range, COUNT(*)
FROM thing
GROUP BY FLOOR(age / 10) ORDER BY FLOOR(age / 10);
Output:
age_range | count
-----------+-------
0 | 1
1 | 2
4 | 3
However, it doesn't show me the ranges which have 0 counts. How can I modify the query so that it also shows the ranges in between with 0 counts?
I found similar stackoverflow questions for counting ranges, some for 0 counts, but they involve having to specify each range (either hard-coding the ranges into the query, or putting the ranges in a table). I would prefer to use a generic query like that above where I do not have to explicitly specify each range (e.g., 0-10, 10-20, 20-30, ...). I'm using PostgreSQL 9.1.3.
Is there a way to modify the simple query above to include 0 counts?
Similar:
Oracle: how to "group by" over a range?
Get frequency distribution of a decimal range in MySQL
generate_series to the rescue:
select 10 * s.d, count(t.age)
from generate_series(0, 10) s(d)
left outer join thing t on s.d = floor(t.age / 10)
group by s.d
order by s.d
Figuring out the upper bound for generate_series should be trivial with a separate query, I just used 10 as a placeholder.
This:
generate_series(0, 10) s(d)
essentially generates an inline table called s with a single column d which contains the values from 0 to 10 (inclusive).
You could wrap the two queries (one to figure out the range, one to compute the counts) into a function if necessary.
You need some way to invent the table of age ranges. Row number usually works nicely. Do a cartesian product against a big table to get lots of numbers.
WITH RANGES AS (
SELECT (rownum - 1) * 10 AS age_range
FROM ( SELECT row_number() OVER() as rownum
FROM pg_tables
) n
,( SELECT ceil( max(age) / 10 ) range_end
FROM thing
) m
WHERE n. rownum <= range_end
)
SELECT r.age_range, COUNT(t.age) AS count
FROM ranges r
LEFT JOIN thing t ON r.age_range = FLOOR(t.age / 10) * 10
GROUP BY r.age_range
ORDER BY r.age_range;
EDIT: mu is too short has a much more elegant answer, but if you didn't have a generate_series function on the db, ... :)

Returning several rows from a single query, based on a value of a column

Let's say I have this table:
|Fld | Number|
1 5
2 2
And I want to make a select that retrieves as many Fld as the Number field has:
|Fld |
1
1
1
1
1
2
2
How can I achieve this? I was thinking about making a temporary table and instert data based on the Number, but I was wondering if this could be done with a single Select statement.
PS: I'm new to SQL
You can join with a numbers table:
SELECT Fld
FROM yourtable
JOIN Numbers
ON yourtable.Number <= Numbers.Number
A numbers table is just a table with a list of numbers:
Number
1
2
3
etc...
Not an great solution (since you still query your table twice, but maybe you can work from it)
SELECT t1.fld, t1.number
FROM table t1, (
SELECT ROWNUM number FROM dual
CONNECT BY LEVEL <= (SELECT MAX(number) FROM t1)) t2
WHERE t2.number<=t1.number
It generates maximum amount of rows needed and then filters it by each row.
I don't know if your RDBMS version supports it (although I rather suspect it does), but here is a recursive version:
WITH remaining (fld, times) as (SELECT fld, 1
FROM <table>
UNION ALL
SELECT a.fld, a.times + 1
FROM remaining as a
JOIN <table> as b
ON b.fld = a.fld
AND b.number > a.times)
SELECT fld
FROM remaining
ORDER BY fld
Given your source data table, it outputs this (count included for verification):
fld times
=============
1 1
1 2
1 3
1 4
1 5
2 1
2 2