How to apply a recursive query to whole table? - sql

This post is related to another question of mine. I came up with a recursive query that does basically want I want. As long as the count for the dist_calc_points attribute has not exceeded the recursive query is being executed. But this works only for one entry (see the WHERE v2_channel.id=2 clause). How I can apply this query to the whole table?
WITH RECURSIVE dist(x, the_geom, d) AS (
SELECT
0::double precision,
the_geom,
0::double precision
FROM v2_channel where v2_channel.id=2
UNION ALL
SELECT
x+1,
v2_channel.the_geom AS gm,
d+(1/v2_channel.dist_calc_points) AS dist_calc_pnts
FROM v2_channel, dist
WHERE dist.x<v2_channel.dist_calc_points AND v2_channel.id=2
)
SELECT *, ST_AsText(ST_LineInterpolatePoint(the_geom, d)) FROM dist;

To allow the CTE to apply to multiple rows, you have to be able to identify these rows. So just add the ID:
WITH RECURSIVE dist(id, x, the_geom, d) AS (
SELECT
id,
0::double precision,
the_geom,
0::double precision
FROM v2_channel
UNION ALL
SELECT
dist.id,
x+1,
v2_channel.the_geom AS gm,
d+(1/v2_channel.dist_calc_points) AS dist_calc_pnts
FROM v2_channel JOIN dist
ON dist.x < v2_channel.dist_calc_points
AND dist.id = v2_channel.id
)
SELECT *, ST_AsText(ST_LineInterpolatePoint(the_geom, d)) FROM dist;

Related

Generate group code between two values of the group with SQL

i have a big issue , i need to generate a code in the range of two existing columns (CodeFrom / CodeTo) . Like the following screenshots below :
Input :
estimated Output :
Any shared Ideas can help my sure. Thanks
In SQL Server, you can use a recursive CTE:
with cte as (
select codefrom, codeto, town, codefrom as code
from t
union all
select codefrom, codeto, town, code + 1
from cte
where code < codeto
)
select *
from cte;
SQL Server has a built-in default recursion limit of 100. So, if you might be generating more than 100 codes, then add option (maxrecursion 0).
Like I mentioned under Gordon's answer in the comments, use a Tally for this. They are far faster by far (especially with larger datasets) and don't suffer the max recursion error as they aren't recursive:
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3) --1,000 rows, Add more N for more rows
SELECT YT.CodeFrom,
YT.CodeTo,
YT.Town,
T.I AS Code
FROM (VALUES(1,7,'Paris'),
(14,17,'Sao Paulo'))YT(CodeFrom,CodeTo,Town)
JOIN Tally T ON YT.CodeFrom <= T.I
AND YT.CodeTo >= T.I;

SQL to show one result calculated by the other values?

It seems we can use a SQL statement as:
select
(
select
count(*) as c_foos
from
foos
),
(
select
count(*) as c_bars
from
bars
);
but we can't do
select
(
select
count(*) as c_foos
from
foos
),
(
select
count(*) as c_bars
from
bars
),
(
select
(c_foos / c_bars) as the_ratio
);
or
select
(
select
count(*) as c_foos
from
foos
),
(
select
count(*) as c_bars
from
bars
),
(c_foos / c_bars) as the_ratio;
Is there a way to do that showing all 3 numbers? Is there a more definite rule as to what can be done and what can't?
You can try this:
You define two CTEs in a WITH clause, so you can use your result in the main query built on two cte tables (cte_num and cte_den)
WITH recursive
cte_num AS (
SELECT count(*) as c_foos
FROM foos
),
cte_den AS (
SELECT count(*) as c_bars
FROM bars
)
SELECT
cte_num.foos,
cte_den.bars,
cte_num.foos / cte_den.bars as the_ratio
from cte_num, cte_den;
There is a small number of simple rules... but SQL seems so easy that most programmers prefer to cut to the chase, and later complain they didn't get the plot :)
You can think of a query as a description of a flow: columns in a select share inputs (defined in from), but are evaluated "in parallel", without seeing each other. Your complex example boils down to the fact, that you cannot do this:
select 1 as a, 2 as b, a + b;
fields a and b are defined as outputs from the query, but there are no inputs called a and b. All you have to do is modify the query so that a and b are inputs:
select a + b from (select 1 as a, 2 as b) as inputs
And this will work (this is, btw., the solution for your queries).
Addendum:
The confusion comes from the fact, that in most SQL 101 cases outputs are created directly from inputs (data just passes through).
This flow model is useful, because it makes things easier to reason about in more complex cases. Also, we avoid ambiguities and loops. You can think about it in the context of query like: select name as last_name, last_name as name, name || ' ' || last_name from person;
Move the conditions to the FROM clause:
select f.c_foos, b.c_bars, f.c_foos / f.c_bars
from (select count(*) as c_foos from foos
) f cross join
(select count(*) as c_bars from bars
) b;
Ironically, your first version will work in MySQL (see here). I don't actually think this is intentional. I think it is an artifact of their parser -- meaning that it happens to work but might stop working in future versions.
The simplest way is to use a CTE that returns the 2 columns:
with cte as (
select
(select count(*) from foos) as c_foos,
(select count(*) from bars) as c_bars
)
select c_foos, c_bars, (c_foos / c_bars) as the_ratio
from cte
Note that the aliases of the 2 columns must be set outside of each query and not inside (the parentheses).

Oracle 'Invalid Identifier' in sub-query

I'm having an issue with converting a view from PostgreSQL to Oracle when a sub-query is referencing a column in the outer query.
This issue seems to have been discussed here several times but I have been unable to get any of the fixes to work with my specific query.
The query's purpose is to get a mobile devices last recorded position and get the distance in KM from it's closest checkpoint/Geo-boundary and it references 3 separate tables: devices, device_locations and checkpoints.
SELECT
d.id,
dl.latitude AS last_latitude,
dl.longitude AS last_longitude,
(SELECT * /* Get closest 'checkpoint' to the last device position by calculating the Great-circle distance */
FROM (
SELECT
6371 * acos(cos(dl.latitude / (180/acos(-1))) * cos(checkpoints.latitude / (180/acos(-1))) * cos((checkpoints.longitude / (180/acos(-1))) - (dl.longitude / (180/acos(-1)))) + sin(dl.latitude / (180/acos(-1))) * sin(checkpoints.latitude / (180/acos(-1)))) AS distance
FROM checkpoints
ORDER BY distance)
WHERE ROWNUM = 1) AS distance_to_checkpoint
FROM devices d
LEFT JOIN ( /* Get the last position of the device */
SELECT l.id,
l.time,
l.latitude,
l.longitude,
l.accuracy
FROM device_locations l
WHERE l.ROWID IN (SELECT MAX(ROWID) FROM device_locations GROUP BY id)
ORDER BY l.id, l.time DESC) dl
ON dl.id = d.id;
I've been stuck on this for a while and hoping someone can put me on the right path, thanks.
This is a follow-up to my other answer. In order to get the checkpoints record with the minimum distance, you'd join with the table and use window functions again to pick the best record. E.g.:
select
device_id,
last_latitude,
last_longitude,
checkpoint_latitude,
checkpoint_longitude,
distance
from
(
select
device_id,
last_latitude,
last_longitude,
checkpoint_latitude,
checkpoint_longitude,
distance,
min(distance) over (partition by device_id) as min_distance
from
(
select
d.id as device_id,
dl.latitude as last_latitude,
dl.longitude as last_longitude,
cp.latitude as checkpoint_latitude,
cp.longitude as checkpoint_longitude,
6371 *
acos(cos(dl.latitude / (180/acos(-1))) *
cos(cp.latitude / (180/acos(-1))) *
cos((cp.longitude / (180/acos(-1))) - (dl.longitude / (180/acos(-1))))
+
sin(dl.latitude / (180/acos(-1))) *
sin(cp.latitude / (180/acos(-1)))
) as distance
from devices d
left join
(
select
id as device_id, latitude, longitude, time,
max(time) over (partition by id) as max_time
from device_locations
) dl on dl.device_id = d.id and dl.time = dl.max_time
cross join checkpoints cp
)
)
where (distance = min_distance) or (distance is null and min_distance is null);
Such queries are easier to write with CROSS APPLY and OUTER APPLY, available as of Oracle 12c.
I see two issues:
Extra comma after you final select column: AS distance_to_checkpoint,
Outer select columns reference an inner table device_locations l, instead of the derived table dl - example: l.latitude should be dl.latitude
First of all: The query doesn't get the last device positions. It gets the records with the highest ROWID per ID which may happen to be the latest entry, but is not at all guaranteed to be.
Then you most probably have an issue with scope. Unfortunately, names are only valid one level deep, which is an annoying limitation. dl.latitude etc. are probably not valid in your subquery, because it's actually a subquery within a subquery. Anyway, what you are trying to get is the minimum distance, which you can easily get with MIN.
An ORDER BY in a subquery is superfluous in standard SQL. Oracle makes an exception for their ROWNUM technique, but I wouldn't make use of this. (And as mentioned, it's even clumsy for getting a minimum value.) The ORDER BY in the outer join is superfluous anyway.
This is how I would approach the problem:
select
d.id as device_id,
dl.latitude as last_latitude,
dl.longitude as last_longitude,
(
select min(6371 *
acos(cos(dl.latitude / (180/acos(-1))) *
cos(cp.latitude / (180/acos(-1))) *
cos((cp.longitude / (180/acos(-1))) - (dl.longitude / (180/acos(-1))))
+
sin(dl.latitude / (180/acos(-1))) *
sin(cp.latitude / (180/acos(-1)))
)
)
from checkpoints cp
) as distance
from devices d
left join
(
select
id as device_id, latitude, longitude, time,
max(time) over (partition by id) as max_time
from device_locations
) dl on dl.device_id = d.id and dl.time = dl.max_time;

SQL Azure query aggregate performance issue

I'm trying to improve our SQL Azure database performamce, trying to change the use of CURSOR while this is (as everybody told me) something to avoid.
Our table is about GPS information, rows with a id clustered index and secondary indexes on device, timestamp and geography index on location.
I'm trying to compute some statistic such minimum speed (doppler and computed), total distance, average speed, ... along period for a specific device.
I have NO choice on the stat and CAN'T change the table or output because of production.
I have a clear performance issue when running this inline tbl function on my SQL Azure DB.
ALTER FUNCTION [dbo].[fn_logMetrics_3]
(
#p_device smallint,
#p_from dateTime,
#p_to dateTime,
#p_moveThresold int = 1
)
RETURNS TABLE
AS
RETURN
(
WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY timestamp) AS RowNum,
Timestamp,
Location,
Alt,
Speed
FROM
LogEvents
WHERE
Device = #p_device
AND Timestamp >= #p_from
AND Timestamp <= #p_to),
CTE1 AS
(
SELECT
t1.Speed as Speed,
t1.Alt as Alt,
t2.Alt - t1.Alt as DeltaElevation,
t1.Timestamp as Time0,
t2.Timestamp as Time1,
DATEDIFF(second, t2.Timestamp, t1.Timestamp) as Duration,
t1.Location.STDistance(t2.Location) as Distance
FROM
CTE t1
INNER JOIN
CTE t2 ON t1.RowNum = t2.RowNum + 1),
CTE2 AS
(
SELECT
Speed, Alt,
DeltaElevation,
Time0, Time1,
Duration,
Distance,
CASE
WHEN Duration <> 0
THEN (Distance / Duration) * 3.6
ELSE NULL
END AS CSpeed,
CASE
WHEN DeltaElevation > 0
THEN DeltaElevation
ELSE NULL
END As PositiveAscent,
CASE
WHEN DeltaElevation < 0
THEN DeltaElevation
ELSE NULL
END As NegativeAscent,
CASE
WHEN Distance < #p_moveThresold
THEN Duration
ELSE NULL
END As StopTime,
CASE
WHEN Distance > #p_moveThresold
THEN Duration
ELSE NULL
END As MoveTime
FROM
CTE1 t1
)
SELECT
COUNT(*) as Count,
MIN(Speed) as HSpeedMin, MAX(Speed) as HSpeedMax,
AVG(Speed) as HSpeedAverage,
MIN(CSpeed) as CHSpeedMin, MAX(CSpeed) as CHSpeedMax,
AVG(CSpeed) as CHSpeedAverage,
SUM(Distance) as CumulativeDistance,
MAX(Alt) as AltMin, MIN(Alt) as AltMax,
SUM(PositiveAscent) as PositiveAscent,
SUM(NegativeAscent) as NegativeAscent,
SUM(StopTime) as StopTime,
SUM(MoveTime) as MoveTime
FROM
CTE2 t1
)
The broad idea is
CTE is selecting the correponding rows, following the parameters
CTE1 perform aggregation within two consecutive row, in order to get Duration and Distance
then CTE2 perform operation on these Distance and Duration
Finally the last select is doing aggregation such sum and average over each columns
Everything working pretty well, until the last SELECT call where the agregate function (which are only few sum and average) killed the performance.
This query selecting 1500 rows against table with 4M rows is taking 1500ms.
when replacing the last select with
SELECT ÇOUNT(*) as count FROM CTE2 t1
then it's take only few ms.. (down to 2ms according to SQL Studio statistics).
with
SELECT
COUNT(*) as Count,
SUM(MoveTime) as MoveTime
it's about 125ms
with
SELECT
COUNT(*) as Count,
SUM(StopTime) as StopTime,
SUM(MoveTime) as MoveTime
it's about 250ms
like each aggregate are running on consecutive loop operation over all the row, within the same thread and without beeing parallelized
For information, the CURSOR version (I wrote couple of year ago) of this function is running actually at least twice fast...
What is wrong with this aggregate? How to optimize it?
UPDATE :
The query plans for
SELECT COUNT(*) as Count
The query plans for the full Select with agregate
According the answer of Joe C, I introduce a #tmp table in the plans and perform the aggregate on it. The result is about twice as fast, which is an interesting fact.

TOP1 in CROSS JOIN (SQL SERVER)

I have table with child(position x, position y) and with parent(position x, position y) in sql server. What I want is to find closest parent to every child. I can do it "bad way", but probably there is a solution without using any loops.
That`s my code:
SELECT
child.idChild, child.x, child.y,
parent.idParent, parent.x, parent.y,
sqrt(power(child.x - parent.x, 2) + power(child.y - parent.y, 2)) as distance
FROM
child
CROSS JOIN
parent
ORDER BY
idChild, distance
Ok, that`s fine. But now I want to limit parents only to TOP1 for each child.
Thanks
A handy way to do this is with the window functions. To get the top row, you can use either row_number() or rank(). There is a difference when there are ties. row_number() returns only one of multiple values. rank() will return all of them.
Here is one way to write the query:
select idChild, x, y, idParent, parentx, parenty
from (SELECT child.idChild, child.x, child.y,
parent.idParent, parent.x as parentx, parent.y as parenty,
ROW_NUMBER() over (partition by child.idchild
order by power(child.x - parent.x, 2) + power(child.y - parent.y, 2)
) as seqnum
FROM child CROSS JOIN
parent
) pc
where seqnum = 1;
I removed the sqrt() from the distance function because it is not necessary when looking for the smallest number.