Generate grid line coordinates using SQL - sql

Oracle 18c:
Using an SQL query, I want to generate a list of coordinates that make up the line segments of a square grid graph:
STARTPOINT_X STARTPOINT_Y ENDPOINT_X ENDPOINT_Y
------------ ------------ ---------- ----------
0 0 1 0 --horizontal lines
1 0 2 0
2 0 3 0
3 0 4 0
4 0 5 0
5 0 6 0
...
0 0 0 1 --vertical lines
0 1 0 2
0 2 0 3
0 3 0 4
0 4 0 5
0 5 0 6
...
[220 rows selected]
Details:
The lines would be split at each intersection. So, in the image above, there are 220 lines. Each line is composed of two vertices.
Ideally, I would have the option of specifying in the query what the overall grid dimensions would be. For example, specify this somewhere in the SQL: DIMENSIONS = 10 x 10 (or DIMENSIONS = 100 x 100, etc.).
To keep things simple, we can assume the grid's overall shape will always be a square (length = width). And we can make the cell size 1 unit.
I've supplied sample data in this db<>fiddle. I created that data using Excel.
Hint: The vertical grid lines start at row 111.
The reason I want to generate this data is:
I want sample line data to work with when testing Oracle Spatial queries. Sometimes I need a few hundred lines. Other times, I need thousands of lines.
Also, if the lines are in a grid, then it will be obvious if any lines are missing in my results (by looking at the data in mapping software and spotting gaps).
How can I generate those grid line coordinates using SQL?
Related: Generate grid line features using SQL

You can use:
WITH range (v) AS (
SELECT LEVEL - 1 FROM DUAL CONNECT BY LEVEL <= 11
)
SELECT x.v AS startpoint_x,
y.v AS startpoint_y,
x.v + 1 AS endpoint_x,
y.v AS endpoint_y
FROM range x CROSS JOIN range y
WHERE x.v <= 9
UNION ALL
SELECT x.v AS startpoint_x,
y.v AS startpoint_y,
x.v AS endpoint_x,
y.v + 1 AS endpoint_y
FROM range x CROSS JOIN range y
WHERE y.v <= 9
or, more generally:
WITH range (v) AS (
SELECT LEVEL - 1 FROM DUAL CONNECT BY LEVEL - 1 <= GREATEST(:max_x, :max_y)
)
SELECT x.v AS startpoint_x,
y.v AS startpoint_y,
x.v + 1 AS endpoint_x,
y.v AS endpoint_y
FROM range x CROSS JOIN range y
WHERE x.v < :max_x
AND y.v <= :max_y
UNION ALL
SELECT x.v AS startpoint_x,
y.v AS startpoint_y,
x.v AS endpoint_x,
y.v + 1 AS endpoint_y
FROM range x CROSS JOIN range y
WHERE x.v <= :max_x
AND y.v < :max_y
db<>fiddle here

There's a few ways to generate rows in Oracle. Note: This particular (recursive) way might not be optimal for very large grids, for that you might want to cross join 2 rows a bunch of times, however, this way is more amenable to injecting a variable for your dimension.
Selecting from the magic dual table usually returns 1 row but you can use the recursive connect by with the magic level value to determine how many rows you want. It doesn't return a 0-level so I hard-coded that in.
Looking at your square, its a mirror image made up of single unit vectors; all the horizontal vectors are repeated vertically, so only half have to be generated. Note the union all in the final query just returns the same data but swaps the x and y points.
It cross joins dimension CTE 3 times. The first 2 are to get the start & end and only a 3rd because for all the e.g. horizontal vectors we just want the vertical coordinates to be the same for both start and end. It filters out where start & end are equal as those are zero-length vectors which are not needed as well as those longer than length 1 using where b.point - a.point = 1 .
with dimension as (
select 0 as point from dual
union all
select level
from dual
connect by level <= 10
), points as (
select
a.point as startpoint,
b.point as endpoint,
c.point as fixed
from dimension a
cross join dimension b
cross join dimension c
where b.point - a.point = 1
)
select
startpoint as startpoint_x,
fixed as startpoint_y,
endpoint as endpoint_x,
fixed as endpoint_y
from points
union all
select
fixed as startpoint_x,
startpoint as startpoint_y,
fixed as endpoint_x,
endpoint as endpoint_y
from points
order by startpoint_y, endpoint_y, startpoint_x, endpoint_x
The place where you would inject the variable is on line 6, replacing that 10 with whatever grid size you want connect by level <= 10.
In a SQL*Plus script you could do that like
define dimension = 10;
with ...[ rest of the query blah blah ]
connect by level <= &dimension

Related

SQL Max Consecutive Values in a number set using recursion

The following SQL query is supposed to return the max consecutive numbers in a set.
WITH RECURSIVE Mystery(X,Y) AS (SELECT A AS X, A AS Y FROM R)
UNION (SELECT m1.X, m2.Y
FROM Mystery m1, Mystery m2
WHERE m2.X = m1.Y + 1)
SELECT MAX(Y-X) + 1 FROM Mystery;
This query on the set {7, 9, 10, 14, 15, 16, 18} returns 3, because {14 15 16} is the longest chain of consecutive numbers and there are three numbers in that chain. But when I try to work through this manually I don't see how it arrives at that result.
For example, given the number set above I could create two columns:
m1.x
m2.y
7
7
9
9
10
10
14
14
15
15
16
16
18
18
If we are working on rows and columns, not the actual data, as I understand it WHERE m2.X = m1.Y + 1 takes the value from the next row in Y and puts it in the current row of X, like so
m1.X
m2.Y
9
7
10
9
14
10
15
14
16
15
18
16
18
Null?
The main part on which I am uncertain is where in the SQL recursion actually happens. According to Denis Lukichev recursion is the R part - or in this case the RECURSIVE Mystery(X,Y) - and stops when the table is empty. But if the above is true, how would the table ever empty?
Since I don't know how to proceed with the above, let me try a different direction. If WHERE m2.X = m1.Y + 1 is actually a comparison, the result should be:
m1.X
m2.Y
14
14
15
15
16
16
But at this point, it seems that it should continue recursively on this until only two rows are left (nothing else to compare). If it stops here to get the correct count of 3 rows (2 + 1), what is actually stopping the recursion?
I understand that for the above example the MAX(Y-X) + 1 effectively returns the actual number of recursion steps and adds 1.
But if I have 7 consecutive numbers and the recursion flows down to 2 rows, should this not end up with an incorrect 3 as the result? I understand recursion in C++ and other languages, but this is confusing to me.
Full disclosure, yes it appears this is a common university question, but I am retired, discovered this while researching recursion for my use, and need to understand how it works to use similar recursion in my projects.
Based on this db<>fiddle shared previously, you may find it instructive to alter the CTE to include an iteration number as follows, and then to show the content of the CTE rather than the output of final SELECT. Here's an amended CTE and its content after the recursion is complete:
Amended CTE
WITH RECURSIVE Mystery(X,Y) AS ((SELECT A AS X, A AS Y, 1 as Z FROM R)
UNION (SELECT m1.X, m2.A, Z+1
FROM Mystery m1
JOIN R m2 ON m2.A = m1.Y + 1))
CTE Content
x
y
z
7
7
1
9
9
1
10
10
1
14
14
1
15
15
1
16
16
1
18
18
1
9
10
2
14
15
2
15
16
2
14
16
3
The Z field holds the iteration count. Where Z = 1 we've simply got the rows from the table R. The, values X and Y are both from the field A. In terms of what we are attempting to achieve these represent sequences consecutive numbers, which start at X and continue to (at least) Y.
Where Z = 2, the second iteration, we find all the rows first iteration where there is a value in R which is one higher than our Y value, or one higher than the last member of our sequence of consecutive numbers. That becomes the new highest number, and we add one to the number of iterations. As only three numbers in our original data set have successors within the set, there are only three rows output in the second iteration.
Where Z = 3, the third iteration, we find all the rows of the second iteration (note we are not considering all the rows of the first iteration again), where there is, again, a value in R which is one higher than our Y value, or one higher than the last member of our sequence of consecutive numbers. That, again, becomes the new highest number, and we add one to the number of iterations.
The process will attempt a fourth iteration, but as there are no rows in R where the value is one more than the Y values from our third iteration, no extra data gets added to the CTE and recursion ends.
Going back to the original db<>fiddle, the process then searches our CTE content to output MAX(Y-X) + 1, which is the maximum difference between the first and last values in any consecutive sequence, plus one. This finds it's value from the record produced in the third iteration, using ((16-14) + 1) which has a value of 3.
For this specific piece of code, the output is always equivalent to the value in the Z field as every addition of a row through the recursion adds one to Z and adds one to Y.

Grouping rows so a column sums to no more than 10 per group

I have a table that looks like:
col1
------
2
2
3
4
5
6
7
with values sorted in ascending order.
I want to assign each row to groups with labels 0,1,...,n so that each group has a total of no more than 10. So in the above example it would look like this:
col1 |label
------------
2 0
2 0
3 0
4 1
5 1
6 2
7 3
I tried using this:
floor(sum(col1) OVER (partition by ORDER BY col1 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) /10))
But this doesn't work correctly because it is performing the operations
as:
floor(2/10) = 0
floor([2+2]/10) = 0
floor([2+2+3]/10) = 0
floor([2+2+3+4]/10) = 1
floor([2+2+3+4+5]/10 = 1
floor([2+2+3+4+5+6]/10 = 2
floor([2+2+3+4+5+6+7]/10) = 2
It's all coincidentally correct until the last calculation, because even though
[2+2+3+4+5+6+7] / 10 = 2.9
and
floor(2.9) = 2
what it should do is realise 6+7 is > 10 so the 5th row with value 7 needs be in its own group so iterate the group number + 1 and allocate this row into a new group.
What I really want it to do is when it encounters a sum > 10 then set group number = group number + 1, allocate the CURRENT ROW into this new group, and then finally set the new start row to be the CURRENT ROW.
This is too long for a comment.
Solving this problem requires scanning the table, row-by-row. In SQL, this would be through a recursive CTE (or hierarchical query). Hive supports neither of these.
The issue is that each time a group is defined, the difference between 10 and the sum is "forgotten". That is, when you are further down in the list, what happens earlier on is not a simple accumulation of the available data. You need to know how it was split into groups.
A related problem is solvable. The related problem would assign all rows to groups of size 10, splitting rows between two groups. Then you would know what group a later row is in based only on the cumulative sum of the previous rows.

The King's March

You’re given a chess board with dimension n x n. There’s a king at the bottom right square of the board marked with s. The king needs to reach the top left square marked with e. The rest of the squares are labeled either with an integer p (marking a point) or with x marking an obstacle. Note that the king can move up, left and up-left (diagonal) only. Find the maximum points the king can collect and the number of such paths the king can take in order to do so.
Input Format
The first line of input consists of an integer t. This is the number of test cases. Each test case contains a number n which denotes the size of board. This is followed by n lines each containing n space separated tokens.
Output Format
For each case, print in a separate line the maximum points that can be collected and the number of paths available in order to ensure maximum, both values separated by a space. If e is unreachable from s, print 0 0.
Sample Input
3
3
e 2 3
2 x 2
1 2 s
3
e 1 2
1 x 1
2 1 s
3
e 1 1
x x x
1 1 s
Sample Output
7 1
4 2
0 0
Constraints
1 <= t <= 100
2 <= n <= 200
1 <= p <= 9
I think this problem could be solved using dynamic-programing. We could use dp[i,j] to calculate the best number of points you can obtain by going from the right bottom corner to the i,j position. We can calculate dp[i,j], for a valid i,j, based on dp[i+1,j], dp[i,j+1] and dp[i+1,j+1] if this are valid positions(not out of the matrix or marked as x) and adding them the points obtained in the i,j cell. You should start computing from the bottom right corner to the left top, row by row and beginning from the last column.
For the number of ways you can add a new matrix ways and use it to store the number of ways.
This is an example code to show the idea:
dp[i,j] = dp[i+1,j+1] + board[i,j]
ways[i,j] = ways[i+1,j+1]
if dp[i,j] < dp[i+1,j] + board[i,j]:
dp[i,j] = dp[i+1,j] + board[i,j]
ways[i,j] = ways[i+1,j]
elif dp[i,j] == dp[i+1,j] + board[i,j]:
ways[i,j] += ways[i+1,j]
# check for i,j+1
This assuming all positions are valid.
The final result is stored in dp[0,0] and ways[0,0].
Brief Overview:
This problem can be solved through recursive method call, starting from nn till it reaches 00 which is the king's destination.
For the detailed explanation and the solution for this problem,check it out here -> https://www.callstacker.com/detail/algorithm-1

MSSQL 2008R2 Avoid Repeated Items in Matrix Like Table

I tried to find solutions for this and it is somehow easy to solve when records are below a certain number. But...
I have an original list with 81,590 records.
Id Loc Sales LatLong
1 a 100 ...
2 b 110 ...
3 c 105 ...
4 d 125 ...
5 e 123 ...
6 f 35 ...
.
.
.
81,590 ... ... ...
I need to compare all items in the list against each other.
Id L1 L2 Dist
1 a a 0 --> Not needed. Self comparison.
2 a b 26
3 a c 150 --> Not needed. Distance >100.
4 a d 58
5 b a 26 --> Not needed. Repeated record.
6 b b 0 --> Not needed. Self comparison.
7 b c 15
8 b d 151 --> Not needed. Distance >100.
9 c a 150 --> Not needed. Repeated record.
10 c b 15 --> Not needed. Repeated record.
11 c c 0 --> Not needed. Self comparison.
12 c d 75
13 d a 58 --> Not needed. Repeated record.
14 d b 151 --> Not needed. Repeated record.
15 d c 75 --> Not needed. Repeated record.
16 d d 0 --> Not needed. Self comparison.
But as shown next to the records above, the end result needs to be a list that:
1) Compares records against each other ONLY when they are located at a certain distance, say <100 miles.
2) Does not contain duplicates in the sense that comparing Loc1 to Loc2 is the same as comparing Loc2 to Loc1.
3) And the obvious one, no need to compare Loc1 to itself.
The end result would be:
Id L1 L2 Dist
2 a b 26
4 a d 58
7 b c 15
12 c d 75
Approach:
In theory, the total number of records after comparing all items against themselves is 81,590 ^ 2 = 6,656,928,100 records.
Subtracting repeated iterations (LocA-LocB = LocB-LocA) would mean 6,656,928,100 / 2 = 3,328,464,050.
Further cleaning by getting rid of self-repeating iterations (LocA-LocA), should be 3,328,464,050 - 81,590 = 3,328,382,460.
Then I could get rid of all records with distance > 100 miles.
This is highly inefficient, I'd be building a table with 6Bn records, then deleting half, etc. etc. etc.
Is there an approach to arrive to the end product in a much more efficient (less steps, less select/delete/update) way?
What is the select statement needed to insert the final data-set into destination?
It sounds to me like there is a join of the table with itself and a filtering by iterations of the key but here is where I am stuck.
What algorithm are you using to calculate distance between two points? Simple “the world is flat” cartesian math, or that trigonometry-laden “the word is an oblate spherloid” one? This can turn into serious CPU requirements.
It’s probably best to generate a table of “locations that are within distance X of this location” once and store it permanently; barring major events like earthquakes, it’s just not going to change.
Query-wise, the base join is trivial:
SELECT
t1.Loc L1
,t2.Loc L2
from MyTable t1
inner join MyTable t2
on t2.Loc > t1.Loc
If have the distance formula in, say, a function named “distanceFunction”, it might look something like:
WITH cteCalc as (
select
t1.Loc L1
,t2.Loc L2
,dbo.distanceFunction(t1.LatLong, t2.LatLong) Dist
from MyTable t1
inner join MyTable t2
on t2.Loc > t1.Loc
where dbo.distanceFunction(t1.LatLong, t2.LatLong) < #MaxDistance)
INSERT TargetTable (L1, L2, Dist)
SELECT
L1
,L2
,Dist
where Dist <= #MaxDistance
This, of course, may break your system, if only because the transaction log will grow too big while you’re writing a few billion rows to the target table. I'd say build a loop, processing each location in turn, with the final query like:
WITH cteCalc as (
select
t1.Loc L1
,t2.Loc L2
,dbo.distanceFunction(t1.LatLong, t2.LatLong) Dist
from MyTable t1
inner join MyTable t2
on t2.Loc > t1.Loc
where dbo.distanceFunction(t1.LatLong, t2.LatLong) < #MaxDistance
and t1.Loc = #ThisIterationLoc)
INSERT TargetTable (L1, L2, Dist)
SELECT
L1
,L2
,Dist
where Dist <= #MaxDistance
First pass returns 81,589 less whichever are too far away, second pass as 81,588 to process, and so forth.
Here is an outline of how I would solve this problem:
Put indexes on latitude and longitude
Do the math for lat and long of your distance for the range (box) of your distance. Then you know that your distance (as box not a circle) is contained in this delta. You also know that it is not outside of this delta. This constrains the problem considerably.
For example, if the change in lat and long is 10 for your distance then a location at (100,100) your box would be defined by (95,95) and (105,105) values for lat and long.
Write a query that looks at each element (from lowest id) and searches for other elements (with greater id to avoid duplicates) within the the delta of lat and log and save this to a temporary table.
Iterate over that table and do a full calculation to see if it is within the circle (not the box) of your distance.

SQL Query to Replace Multiple Matching Rows With a New Row

For simplicity, suppose I have a SQL table with one column, containing numeric data. Sample data is
11
13
21
22
23
3
31
32
33
41
42
131
132
133
141
142
143
If the table contains ALL the numbers of the form x1, x2, x3, but NOT x (where x is all of the digits of a number but the last digit. So for 123456, x would be 12345), then I want to replace these three rows with a new row, x.
The desired output of the above data would be:
11
13
2
3
31
32
33
41
42
131
132
133
14
How would I accomplish this with SQL? I should mention that I do not want to permanently alter the data - just for the query results.
Thanks.
I assume
the presence of to functions: lastDigit and head producing the last digit and the rest of the input value respectively
the data is unique
only the digits 1,2,3 are used for constructing the table values
the table is named t and has a single column x
you don't want this to work recursively
create a view n like this: select head(x) h, lastDigit(x) last from t You can use inline views instead
create a view c like this
select h
from n
group by h
having count(*) = 3
Then this should give the desired result:
select distinct x
from (
select x from t where head(x) not in (select h from c)
union
select h from c
)
You need two SQL commands, one to remove the existing rows and the second to add the new row. They would look like this (where :X is a parameter containing your base number, 14 in your example):
DELETE FROM YourTable WHERE YourColumn BETWEEN (:X*10) AND ((:X*10) + 9)
INSERT INTO YourTable (YourColumn) VALUES (:X)
Note: I assume you want all the numbers from x0 to x9 to be removed, so that's what I wrote above. If you really only want x1, x2, x3 removed, then you would use this DELETE statement instead:
DELETE FROM YourTable WHERE YourColumn BETWEEN ((:X*10) + 1) AND ((:X*10) + 3)