I am interested in generating a series of consecutive integers from 1 to 1000 (for example) and storing these numbers in each row of some table. I would like to do this in Microsoft Azure SQL but I am not sure if arrays are even supported.
One relatively simple method is a recursive CTE:
with n as (
select 1 as n
union all
select n + 1
from n
where n < 1000
)
select n.n
from n
options (maxrecursion 0);
Another mechanism to solve something like this could be to use a SEQUENCE on the table. It's similar to an IDENTITY column (they actually have the same behavior under the covers) without some of the restrictions. Just reset it to a new seed value as you add data to the table.
Related
I need generate serial number in Oracle SQL Query..
Example:
rownum (1,2..9,10,11..18,19,20..N)
my_srl_no (1,2..9,1,2..9,1,2..N)
What you are looking for is the modulo function MOD (https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/MOD.html#GUID-E12A3928-2C50-45B0-B8C3-82432C751B8C).
If it's really rownum you want to deal with:
mod(rownum - 1, 9) + 1
but usually you would rather use ROW_NUMBER in order to number your rows by some sort criteria. Anyway, the math stays the same.
Given the following table, the question is to find for example the top N C2 from each C1.
C1 C2
1 1
1 2
1 3
1 4
1 ...
2 1
2 2
2 3
2 4
2 ...
....
So if N = 3, the results are
C1 C2
1 1
1 2
1 3
2 1
2 2
2 3
....
The proposed solutions use the window function and partition by
Select top 10 records for each category
https://www.the-art-of-web.com/sql/partition-over/
For example,
SELECT rs.Field1,rs.Field2
FROM (
SELECT Field1,Field2, Rank()
over (Partition BY Section
ORDER BY RankCriteria DESC ) AS Rank
FROM table
) rs WHERE Rank <= 3
I guess what it does is sorting then picking the top N.
However if some categories have less N elements, we can get the top N w/o sorting because the top N must include all elements in the category.
The above query uses Rank(). My question applies to other window functions like row_num() or dense_rank().
Is there a way to ignore the sorting at the case?
Also I am not sure if the underlying engine can optimize the case: whether the inner partition/order considers the outer where constraints before sorting.
Using partition+order+where is a way to get the top-N element from each category. It works perfect if each category has more than N element, but has additional sorting cost otherwise. My question is if there is another approach that works well at both cases. Ideally it does the following
for each category {
if # of element <= N:
continue
sort and get the top N
}
For example, but is there a better SQL?
WITH table_with_count AS (
SELECT Field1, Field2, RankCriteria, count() over (PARTITION BY Section) as c
FROM table
),
rs AS (
SELECT Field1,Field2, Rank()
over (Partition BY Section
ORDER BY RankCriteria DESC ) AS Rank
FROM table_with_count
where c > 10
)
(SELECT Field1,Field2e FROM rs WHERE Rank <= 10)
union
(SELECT Field1,Field2 FROM table_with_count WHERE c <= 10)
No, an there really shouldn't be. Overall what you describe here is the XY-problem.
You seem to:
Worry about sorting, while in fact sorting (with optional secondary sort) is the most efficient way of shuffling / repartitioning data, as it doesn't lead to proliferation of file descriptors. In practice Spark strictly prefers sort over alternatives (hashing) for exactly that reason.
Worry about "unnecessary" sorting of small groups, when in fact the problem is intrinsic inefficiency of window functions, which require full shuffle of all data, therefore exhibit the same behavior pattern as infamous groupByKey.
There are more efficient patterns (MLPairRDDFunctions.topByKey being the most prominent example) but these haven't been ported to Dataset API, and would require custom Aggregator It is also possible to approximate selection (for example through quantile approximation), but this increases the number of passes over data, and in many cases won't provide any performance gains.
This is too long for a comment.
There is no such optimization. Basically, all the data is sorted when using windowing clauses. I suppose that a database engine could actually use a hash algorithm for the partition by and a sort algorithm for the order by, but I don't think that is a common approach.
In any case, the operation is over the entire set, and it should be optimized for this purpose. Trying not to order a subset would add lots of overhead -- for instance, running the sort multiple times for each subset and counting the number of rows in each subset.
Also note that the comparison to "3" occurs (logically) after the window function. I don't think window functions are typically optimized for such post-filtering (although once again, it is a possible optimization).
I have been reading around With Query in Postgres. And this is what I'm surprised with
WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;
I'm not able to understand how does the evaluation of the query work.
t(n) it sound like a function with a parameter. how does the value of n is passed.
Any insight on how the break down happen of the recursive statement in SQL.
This is called a common table expression and is a way of expressing a recursive query in SQL:
t(n) defines the name of the CTE as t, with a single column named n. It's similar to an alias for a derived table:
select ...
from (
...
) as t(n);
The recursion starts with the value 1 (that's the values (1) part) and then recursively adds one to it until the 99 is reached. So it generates the numbers from 1 to 99. Then final query then sums up all those numbers.
n is a column name, not a "variable" and the "assignment" happens in the same way as any data retrieval.
WITH RECURSIVE t(n) AS (
VALUES (1) --<< this is the recursion "root"
UNION ALL
SELECT n+1 FROM t WHERE n < 100 --<< this is the "recursive part"
)
SELECT sum(n) FROM t;
If you "unroll" the recursion (which in fact is an iteration) then you'd wind up with something like this:
select x.n + 1
from (
select x.n + 1
from (
select x.n + 1
from (
select x.n + 1
from (
values (1)
) as x(n)
) as x(n)
) as x(n)
) as x(n)
More details in the manual:
https://www.postgresql.org/docs/current/static/queries-with.html
If you are looking for how it is evaluated, the recursion occurs in two phases.
The root is executed once.
The recursive part is executed until no rows are returned. The documentation is a little vague on that point.
Now, normally in databases, we think of "function" in a different way than we think of them when we do imperative programming. In database terms, the best way to think of a function is "a correspondence where for every domain value you have exactly one corresponding value." So one of the immediate challenges is to stop thinking in terms of programming functions. Even user-defined functions are best thought about in this other way since it avoids a lot of potential nastiness regarding the intersection of running the query and the query planner... So it may look like a function but that is not correct.
Instead the WITH clause uses a different, almost inverse notation. Here you have the set name t, followed (optionally in this case) by the tuple structure (n). So this is not a function with a parameter, but a relation with a structure.
So how this breaks down:
SELECT 1 as n where n < 100
UNION ALL
SELECT n + 1 FROM (SELECT 1 as n) where n < 100
UNION ALL
SELECT n + 1 FROM (SELECT n + 1 FROM (SELECT 1 as n)) where n < 100
Of course that is a simplification because internally we keep track of the cte state and keep joining against the last iteration, so in practice these get folded back to near linear complexity (while the above diagram would suggest much worse performance than that).
So in reality you get something more like:
SELECT 1 as n where 1 < 100
UNION ALL
SELECT 1 + 1 as n where 1 + 1 < 100
UNION ALL
SELECT 2 + 1 AS n WHERE 2 + 1 < 100
...
In essence the previous values carry over.
I have a process that needs to select rows from a Table (queued items) each row has a quantity column and I need to select rows where the quantities add to a specific multiple. The mulitple is the order of between around 4, 8, 10 (but could in theory be any multiple. (odd or even)
Any suggestions on how to select rows where the sum of a field is of a specified multiple?
My first thought would be to use some kind of MOD function which I believe in SQL server is the % sign. So the criteria would be something like this
WHERE MyField % 4 = 0 OR MyField % 8 = 0
It might not be that fast so another way might be to make a temp table containing say 100 values of the X times table (where X is the multiple you are looking for) and join on that
I need to show more than one result from each field in a table. I need to do this with only one SQL sentence, I donĀ“t want to use a Cursor.
This seems silly, but the number of rows may vary for each item. I need this to print afterwards this information as a Crystal Report detail.
Suppose I have this table:
idItem Cantidad <more fields>
-------- -----------
1000 3
2000 2
3000 5
4000 1
I need this result, using one only SQL Sentence:
1000
1000
1000
2000
2000
3000
3000
3000
3000
3000
4000
where each idItem has Cantidad rows.
Any ideas?
It seems like something that should be handled in the UI (or the report). I don't know Crystal Reports well enough to make a suggestion there. If you really, truly need to do it in SQL, then you can use a Numbers table (or something similar):
SELECT
idItem
FROM
Some_Table ST
INNER JOIN Numbers N ON
N.number > 0 AND
N.number <= ST.cantidad
You can replace the Numbers table with a subquery or function or whatever other method you want to generate a result set of numbers that is at least large enough to cover your largest cantidad.
Check out UNPIVOT (MSDN)
Another example
If you use a "numbers" table that is useful for this and many similar purposes, you can use the following SQL:
select t.idItem
from myTable t
join numbers n on n.num between 1 and t.Cantidad
order by t.idTtem
The numbers table should just contain all integer numbers from 0 or 1 up to a number big enough so that Cantidad never exceeds it.
As others have said, you need a Numbers or Tally table which is just a sequential list of integers. However, if you knew that Cantidad was never going to be larger than five for example, you can do something like:
Select idItem
From Table
Join (
Select 1 As Value
Union All Select 2
Union All Select 3
Union All Select 4
Union All Select 5
) As Numbers
On Numbers.Value <= Table.Cantidad
If you are using SQL Server 2005, you can use a CTE to do:
With Numbers As
(
Select 1 As Value
Union All
Select N.Value + 1
From Numbers As N
)
Select idItem
From Table
Join Numbers As N
On N.Value <= Table.Cantidad
Option (MaxRecursion 0);