TOP1 in CROSS JOIN (SQL SERVER) - sql

I have table with child(position x, position y) and with parent(position x, position y) in sql server. What I want is to find closest parent to every child. I can do it "bad way", but probably there is a solution without using any loops.
That`s my code:
SELECT
child.idChild, child.x, child.y,
parent.idParent, parent.x, parent.y,
sqrt(power(child.x - parent.x, 2) + power(child.y - parent.y, 2)) as distance
FROM
child
CROSS JOIN
parent
ORDER BY
idChild, distance
Ok, that`s fine. But now I want to limit parents only to TOP1 for each child.
Thanks

A handy way to do this is with the window functions. To get the top row, you can use either row_number() or rank(). There is a difference when there are ties. row_number() returns only one of multiple values. rank() will return all of them.
Here is one way to write the query:
select idChild, x, y, idParent, parentx, parenty
from (SELECT child.idChild, child.x, child.y,
parent.idParent, parent.x as parentx, parent.y as parenty,
ROW_NUMBER() over (partition by child.idchild
order by power(child.x - parent.x, 2) + power(child.y - parent.y, 2)
) as seqnum
FROM child CROSS JOIN
parent
) pc
where seqnum = 1;
I removed the sqrt() from the distance function because it is not necessary when looking for the smallest number.

Related

Selecting one column of the Nth row from a refernced table in the select list

I have two tables: cross_section and layer.
Every cross section has multiple layers, which in turn have a position and a summary column. The positions (integer) are used to order each cross section's layers, but do not have to be consecutive (e.g. 10, 20, 40).
I need to extend the query to the cross_section table to include summaries of each cross section's layers (fixed number > 1) that preserves the order defined by the position attribute.
Most of the SQL gets generated automatically; I can only add to the select list extra elements containing subqueries.
This is what the automatically generated SQL looks like:
select cs.*, [first layer summary], [second layer summary]... from cross_section cs;
I have tried multiple different approaches, but none worked as expected (maybe this is simply not possible).
My current non-working statment looks like this:
---------------generated------------------------------------------
select cs.*,
---------------partial statment for clumn one---------------------
(select summary
from (select summary, l.cs_id
from layer l order by layer_position)
where cross_section.id like cs_id and rownum=1) layer_summary_1,
---------------partial statment for clumn two---------------------
(select summary
from (select summary, l.cs_id
from layer l order by layer_position)
where cross_section.id like cs_id and rownum=2) layer_summary_2
---------------generated------------------------------------------
from cross_section cs;
Right now, anything other than the first position comes back as null.
Edit:
Example output, as requested:
CSID, Stuff from cross section, layer_summary_1, layer_summary_2
12345, ..., stuff (null)
The problem is this clause: rownum=2.
The rownum pseudo-column enumerates rows in the generated result set. So there is a rownum = 1 which returns one row; this is why you found the first position comes back as expected. But there can never be rownum = 2. ( rownum <= 2 returns two rows, because there is a row where rownum = 1 .)
The solution is simple: use the analytic row_number() function in your subqueries which you can reference in the main WHERE clause:
---------------generated------------------------------------------
select cs.*,
---------------partial statment for clumn one---------------------
(select summary
from (select summary, l.cs_id
, row_number() over (order by layer_position) as rn
from layer l order by layer_position)
where cross_section.id like cs_id and rn=2)
---------------partial statment for clumn two---------------------
(select summary
from (select summary, l.cs_id
, row_number() over (order by layer_position) as rn
from layer l order by layer_position)
where cross_section.id like cs_id and rn=2)
---------------generated------------------------------------------
from cross_section cs;
Your question isn't clear about the data or business logic, so the above may produce incorrect results. If so, you need to tweak the window clause by adding PARTITION BY criteria. Such as:
, row_number() over (partition by cs_id order by layer_position) as rn

Sort a list with but with pre-determined override values using SQL

The business problem is a bit obtuse so I won't get into the details.
I have to come up with a sort index for a set of keys, but some of those keys have a pre-determined position in the index which must be respected. The remaining keys have to be ordered as normal but "around" the pre-determined ones.
Simple example is to sort the letters A through E, except that A must be position 3 and D must be position 1. The result I want to achieve is:
A: 3 B: 2 C: 4 D: 1 E: 5
DDL to set up sample:
CREATE TABLE test.element (element_key TEXT, override_sort_idx INTEGER);
insert into test.element VALUES ('A', 3), ('B', Null), ('C', NULL), ('D', 1), ('E', NULL);
The best solution I can come up with is this, but although it appears to work for this simple example, it goes wrong in the general case - it falls apart if you add some more pre-defined values [EDIT - it doesn't even work in this example because A comes out as 4 - apologies]:
WITH inner_sort AS (SELECT element_key, override_sort_idx, row_number()
OVER (ORDER BY element_key) AS natural_sort_idx
FROM test.element)
SELECT element_key, row_number()
OVER
(ORDER BY
CASE
WHEN override_sort_idx IS NULL
THEN natural_sort_idx
ELSE override_sort_idx END) AS hybrid_sort
FROM inner_sort;
Any ideas for a solution that works in the general case?
This proved to be more of a challenge that I initially expected.
But this SQL returns the expected results:
WITH OPENNUMBERS AS
(
select row_number() over () as num
from test.element
except
select override_sort_idx
from test.element
where override_sort_idx is not null
)
, OPENNUMBERS2 AS
(
select num, row_number() over (order by num) as rn
from OPENNUMBERS
)
,NORMALS AS
(
select element_key, row_number() over (order by element_key) as rn
from test.element
where override_sort_idx is null
)
select n.element_key, o.num as hybrid_sort_idx
from OPENNUMBERS2 o
join NORMALS n ON n.rn = o.rn
union all
select element_key, override_sort_idx
from test.element
where override_sort_idx is not null
order by hybrid_sort_idx;
You can test it here on SQL Fiddle.
The trick used?
Get a list of index numbers that are still free after you remove the overriden. (using EXCEPT)
Then get a row_number for those numbers and also for the non-overridden.
Join those on the rownumber.
Then stich the overridden to it.
Posting this since it works, and I think it's the best I can do, but it's pretty horrific.
WITH grouped AS (
SELECT element_key, override_sort_idx,
row_number() OVER (
PARTITION BY override_sort_idx IS NULL
ORDER BY override_sort_idx, element_key)
AS group_idx,
row_number() OVER (ORDER BY element_key) AS natural_sort_idx
FROM test.element),
remaining_idx AS (
SELECT row_number() OVER () AS remain_idx FROM test.element
EXCEPT
SELECT override_sort_idx FROM test.element),
indexed_remaining AS (
SELECT row_number() OVER (ORDER BY remain_idx) AS r_sort_idx,
remain_idx
FROM remaining_idx)
SELECT g.element_key,
coalesce(g.override_sort_idx, r.remain_idx) AS hybrid_index
FROM grouped g
LEFT JOIN indexed_remaining r ON
(CASE WHEN g.override_sort_idx IS NULL
THEN g.group_idx END = r.r_sort_idx)
ORDER BY hybrid_index
This involves creating the "remaining" index values first as the difference between a simple row_number() and the pre-determined index values, which is then joined to a sorted list of keys without pre-determined index values.
The CASE statement in the JOIN is functionally unnecessary given the order of the coalesce but it seems like the "purer" approach.
I have a feeling that someone smarter than me, who understands window functions properly, could write this using window functions with filters, or manipulating the range of a window function, without the crazy nested subqueries/CTEs and joins.

SQLServer: LAG & LEAD instead of recursive calculation

I am pretty new to the new version of SQL Server 2016 and haven't used the new LAG & LEAD functions yet.
If i understood right, it will make work easier in cases where we currently use the ROW_NUMBER() function and furthermore join the results to connect the records in a certain order.
A case where i currently use this way to connect the records is:
;WITH IncrementingRowNums AS
(
SELECT d.MyKey
,d.Outstanding
,d.Rate
,AMO.PaymentAmount
,AMO.AmoDate
,ROW_NUMBER() OVER (PARTITION BY d.MyKey ORDER BY AMO.AmoDate ASC) AS RowNum
FROM Deals d
INNER JOIN Amortization AMO
ON d.MyKey = AMO.MyKey
),
lagged AS
(
SELECT MyKey
,Outstanding AS new_outstanding
,Rate
,PaymentAmount
,AmoDate
,RowNum
FROM IncrementingRowNums
WHERE RowNum = 1
UNION ALL
SELECT i.MyKey
,(l.new_outstanding - l.PaymentAmount)
* (1 + i.Rate * (DATEDIFF(DAY,l.AmoDate, i.AmoDate)/365.25))
AS new_outstanding
,i.Rate
,i.PaymentAmount
,i.AmoDate
,i.RowNum
FROM IncrementingRowNums i
INNER JOIN lagged l
ON i.RowNum = l.RowNum + 1
AND i.MyKey = l.MyKey
Whats the best way to solve this solution with the LAG&LEAD functions?
I tried several ways, but it never worked out.
The only thing i want to calculate is the column new_outstanding.
Which calculates like:
(previous_record.new_outstanding - previous_record.PaymentAmount)
* (1 + current_record.Rate * (DATEDIFF(DAY,previous_record.AmoDate, current_record.AmoDate)/365.25))
As there is no SQL Server 2016 Version on rextester, i can just provide a little test-data and the my old solution of the recursive calculation: http://rextester.com/WVTM46505
Thanks

T-SQL to find length of time a particular value is in-range

I have a table in SQL Server where, for each row r at time t, I would like to find the first t + i for some function of r where abs(f(r, t + i) - f(r, t)) > epsilon.
I can imagine doing this with two cursors, but this seems highly inefficient.
Any of the T-SQL gurus out there have any advice?
select a.t, b.t --- and what other columns you need
from tbl a -- implicitly each row of table
cross apply (
select top(1) * -- the first, going upwards along b.t
from tbl b
where a.t < b.t -- look for records with later t than the source row
and <here's your function between a row and b row>
order by b.t asc
) x
I'm not a big fan of correlated subqueries. But, it seems useful in this case. The following code returns the minimum "sequence number" of the first row after the given row subject to your condition:
with t as (
select t.*, f(r, t) as fval, row_number() over (order by <ordering>) as seqnum
from table t
)
select t.*,
(select min(t2.seqnum)
from t t2
where t2.seqnum > t.seqnum and
abs(t2.fval - t.fval) > <epsilon>
) as next_seqnum
from t
To make this work, you need to specify <ordering> and <epsilon>. is how you know the order of the rows (t would be a good guess, if I had to guess).

Select nearest but not greater-than row

Given a table action(start:DATE, length:NUMBER, type:NUMBER), with all records unique, I need to select (e.g.) length of last action with type Y before time X:
select action.length
where action.type = Y
and action.start is the biggest, but not greater than X
Proposed solution (improved):
with actionView as (select * from action where type = Y and time <= X)
select length
from actionView
where time = (select max(time) from actionView)
But this still envolves two selects.
What I would like to ask is it possible to perform some analytical or hierarchical or any other oracle magic to this query, to improve it?
(Probably, something like this algo is what I need, but I don't know how to express it in SQL:
savedAction.time = MinimalTime
foreach action in actions
if action.type = y and savedAction.time < action.time <= X
savedAction = action
return savedAction;
)
Oracle has no LIMIT (PostgreSQL, MySQL) or TOP (SQL Server) clause like other RDBMS. But you can use ROWNUM for that:
SELECT *
FROM (
SELECT length
FROM action
WHERE type = Y
AND start < X
ORDER BY start DESC
)
WHERE rownum = 1;
This way, the table will be queried once only.
The details in the manual.
In reply to Dems comment I quote from the link above:
If you embed the ORDER BY clause in a subquery and place the ROWNUM
condition in the top-level query, then you can force the ROWNUM
condition to be applied after the ordering of the rows.
You can use ROW_NUMBER() to evaluate this in a single scan...
WITH
sequenced_data
AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY x ORDER BY start DESC) AS sequence_id,
*
FROM
action
WHERE
type = Y
AND start < Z
)
SELECT
*
FROM
sequenced_data
WHERE
sequence_id = 1
You don't need the PARTITION BY, but it is used where you're getting the 'max' row per goup (such as per person, or item, in your database).
I don't know about any magic, but:
with (select length, time from action where type = Y and time <= X) as typed_action
select length from typed_action
where time = (select max(time) from typed_action)
Will give you less "where" clauses to execute and a (much?) smaller temporary typed_action table.