Finding Nearest adjacent point Recursively - sql

I have a dataset of all locations my phone has been. (I got it via Google Takeout if you are interested.) The problem with the data is that at a certain point, I got a second phone. The dataset I have doesn't have any information that allows me to track data by a specific phone. So if I leave a phone at home then it shows me at two places at once. I decided to write a query that tries to find adjacent points by determining which point in the last 5 are closest eliminating and point I had to have traveled faster than 150mph in order to get to.
The table definition for the data is here:
CREATE TABLE [dbo].[locationdata](
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[t] [datetime] NULL,
[lat] [float] NULL,
[long] [float] NULL,
[accuracy] [smallint] NULL,
[activity] [varchar](14) NULL,
[confidence] [int] NULL,
[velocity] [varchar](2) NULL,
[altitude] [smallint] NULL,
[heading] [smallint] NULL,
[point] [geography] NULL,
[tag] [varchar](50) NULL,
CONSTRAINT [PK_locationdata] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
The rows were inserted in time order so the IDs line up in the correct order with the exception being that the same time can exist for multiple points.
So here is my attempt at writing this without a CURSOR. The issue being that you can't use a "TOP" in the recursive part of a Common Table Expression.
WITH tripdata(originid, endid, startid, speed, distance, startpoint, startt)
AS
(
SELECT originid, endid, startid, speed, distance, startpoint, startt
FROM
(
SELECT
origin.id as originid
, NULL as endid
, origin.id as startid
, NULL as distance
, NULL as speed
, point as startpoint
, t as startt
FROM locationdata origin
) a
UNION ALL
SELECT
originid as originid
, startid as endid
, l.id as startid
, origin.startpoint.STDistance(l.point) as distance
, (origin.startpoint.STDistance(l.point)/(datediff(S, origin.startt, l.t))) * -2.23694 as speed
, l.point as startpoint
, l.t as startt
FROM tripdata origin
CROSS APPLY
(
SELECT top 1
z.id
,z.point
,z.t
FROM locationdata z
where origin.startid > z.ID and origin.startid -5 < z.ID
and z.t <> origin.startt
and (origin.startpoint.STDistance(z.point)/(datediff(S, origin.startt, z.t))) * -2.23694 < 150
order by origin.startpoint.STDistance(z.point)
) l
)
SELECT *
FROM tripdata
WHERE originid = 218255
;
I am open to suggestions on how this query might be fixed or if it is even possible.

Related

RANK() SQL Server execution plan issue

What is driving SQL Server to use less optimal execution plan for queries where 6000+ rows are returned? I need to improve query performance for scenario where all rows are returned.
I select all fields and add rank over same three columns included in index. Depending on number of returned rows, query has two different execution plans, hence execution takes 0.2s or 3s respectively.
From 1 row returned up to ca. 5000 query runs fast. From 6000 rows returned up to all, query runs slow.
Table1 has ca. 38000 rows. Database runs on Azure SQL v12.
Table:
CREATE TABLE [dbo].[Table1](
[ID] [int] IDENTITY(1,1) NOT NULL,
[KOD_ID] [int] NULL,
[SYM] [nvarchar](20) NULL,
[AN] [nvarchar](35) NULL,
[A] [nvarchar](10) NULL,
[B] [nvarchar](2) NULL,
[C] [datetime] NULL,
[D] [datetime] NULL,
CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
GO
CREATE NONCLUSTERED INDEX [IX_Table1] ON [dbo].[Table1]
(
[KOD_ID] ASC,
[SYM] ASC,
[AN] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
Queries:
SELECT TOP 6000 *, RANK() OVER(ORDER BY KOD_ID ASC, SYM ASC, AN ASC) AS Rank#
FROM [dbo].[Table1]
SELECT TOP 7000 *, RANK() OVER(ORDER BY KOD_ID ASC, SYM ASC, AN ASC) AS Rank#
FROM [dbo].[Table1]
Execution plans for both queries
CREATE NONCLUSTERED INDEX [IX_Table1] ON [dbo].[Table1]
(
[KOD_ID] ASC,
[SYM] ASC,
[AN] ASC
) INCLUDE ([A], [B], [C], [D]);
Create such kind of a covering index and it should scan this index and most likely sort won't even be needed because it's data is already sorted in index.
The key points in your queries are:
First plan has a key lookup, avoid them as much as possible (key lookup is additional scan for each row because index does not have them) create covering indexes with INCLUDED columns
Avoid sort operations too, they're costly to SQL Server
If you're alright with index rebuilds and favor reads over inserts, these could be alternate DDLs for your table considering that and KOD_ID, SYM, AN are not null-able:
If ID is needed to ensure uniqueness:
CREATE TABLE [dbo].[Table1] (
[KOD_ID] [int] NOT NULL
, [SYM] [nvarchar](20) NOT NULL
, [AN] [nvarchar](35) NOT NULL
, [ID] [int] IDENTITY(1, 1) NOT NULL
, [A] [nvarchar](10) NULL
, [B] [nvarchar](2) NULL
, [C] [datetime2] NULL
, [D] [datetime2] NULL
, CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED ([KOD_ID], [SYM], [AN], [ID])
);
GO
If ID is not needed to ensure uniqueness:
CREATE TABLE [dbo].[Table1] (
[KOD_ID] [int] NOT NULL
, [SYM] [nvarchar](20) NOT NULL
, [AN] [nvarchar](35) NOT NULL
, [A] [nvarchar](10) NULL
, [B] [nvarchar](2) NULL
, [C] [datetime2] NULL
, [D] [datetime2] NULL
, CONSTRAINT [PK_Table1] PRIMARY KEY CLUSTERED ([KOD_ID], [SYM], [AN])
);
GO
Also, note that I use datetime2 instead of datetime, that's what Microsoft recommends: https://learn.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql
Use the time, date, datetime2 and datetimeoffset data
types for new work. These types align with the SQL Standard. They are
more portable. time, datetime2 and datetimeoffset provide
more seconds precision. datetimeoffset provides time zone support
for globally deployed applications.

sql running total or Balance

I need running VoucherNo concatenation just like running balance or total.. Concatenate the previous VoucherNo to current VoucherNo row wise just like shown in picture
Query is:
select
v.VoucherDate,v.VoucherNo,v.VoucherType,v.Narration,SUM(v.Debit) Debit , SUM(v.Credit) Credit,dbo.GetBalance(v.CompanyProfileId,v.AccountCode,v.VoucherDate ,SUM(v.Debit), SUM(v.Credit)) Balance
from AcVoucher v
where v.VoucherDate Between '2016-03-24' and '2016-03-30' and v.CompanyProfileId = 2 and v.AccountCode = '05010001'
group by v.VoucherNo,v.VoucherDate,v.VoucherType,v.Narration,v.CompanyProfileId,v.AccountCode
Schema :
GO
/****** Object: Table [dbo].[AcVoucher] Script Date: 03/30/2016 3:47:02 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[AcVoucher](
[Id] [bigint] IDENTITY(1,1) NOT NULL,
[CompanyProfileId] [int] NOT NULL,
[AccountCode] [nvarchar](50) NOT NULL,
[VoucherNo] [bigint] NOT NULL,
[VoucherType] [nvarchar](5) NOT NULL,
[VoucherDate] [datetime] NOT NULL,
[Narration] [nvarchar](500) NULL,
[Debit] [float] NOT NULL,
[Credit] [float] NOT NULL,
[TaxPercentage] [float] NULL,
[DiscountPercentage] [float] NULL,
[CreatedBy] [int] NULL,
[CreatedDate] [datetime] NULL,
CONSTRAINT [PK_ACVoucher_1] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
You can do this using basically the same logic as for string concatenation in SQL Server. The only difference is the where clause.
One key issue is the ordering for the concatenation. This is not apparent in the question, so I added a minid to the query and this is used (in reverse order) for selecting the ids to bring together:
with v as (
select v.VoucherDate, v.VoucherNo, v.VoucherType, v.Narration,
SUM(v.Debit) as Debit , SUM(v.Credit) as Credit,
dbo.GetBalance(v.CompanyProfileId, v.AccountCode, v.VoucherDate,
SUM(v.Debit), SUM(v.Credit)
) as Balance,
min(id) as minid
from AcVoucher v
where v.VoucherDate Between '2016-03-24' and '2016-03-30' and
v.CompanyProfileId = 2 and v.AccountCode = '05010001'
group by v.VoucherNo, v.VoucherDate, v.VoucherType,v.Narration, v.CompanyProfileId, v.AccountCode
)
select v.*,
stuff((select ',' + cast(v2.VoucherNo as varchar(8000))
from v v2
where v2.minid >= v.minid
for xml path ('')
), 1, 1, '') as RunningConcat
from v;

SQL Descending ordered LEFT JOIN subquery issue

I have the following query.
SELECT r1.*,
r2.vlag54,
r2.vlag55
FROM [rxmon].[dbo].[a] AS r1
LEFT JOIN [rxmon].[dbo].[b] AS r2
ON r2.artikelnummer = r1.drug_id
LEFT JOIN (SELECT *
FROM [rxmon].[dbo].[c]) AS r3
ON r3.pid = r1.patient_id
WHERE r3.obx_id = 20937
AND Cast(r3.obx_datetime AS DATE) = Cast(Getdate() - 1 AS DATE)
AND r1.patient_id = 7092425
AND obx_value < CASE
WHEN r2.vlag54 = 1 THEN 30
WHEN r2.vlag55 = 1 THEN 50
END
AND r2.vlag54 = CASE
WHEN r3.obx_value < 30 THEN 1
ELSE 0
END
AND r2.vlag55 = CASE
WHEN r3.obx_value BETWEEN 30 AND 50 THEN 1
ELSE 0
END
ORDER BY obx_datetime DESC;
The problem is that table C can contain multiple records based on de PID join. This generates the same records because of the multiple records on table C.
The table C needs to e joined as the latest record only so just 1 of C. That way the table A record will not be repeated.
I tried TOP 1 and order by but that can't be used in subquery.
-- TABLE A
CREATE TABLE [dbo].[A]
[EVS_MO_ID] [bigint] NOT NULL,
[DRUG_ID] [varchar](50) NOT NULL,
[ATC_CODE] [varchar](15) NULL,
[DRUG_NAME] [varchar](1024) NULL,
[PATIENT_ID] [varchar](50) NOT NULL,
[PATIENT_LOCATION] [varchar](10) NULL,
[MO_DATE] [datetime2](7) NOT NULL,
[MO_START_DATE] [datetime2](7) NOT NULL,
[MO_STOP_DATE] [datetime2](7) NULL,
[ROUTE] [varchar](50) NULL,
[MEDICATION_CONTAINER] [smallint] NULL,
[PRESCRIBING_DOCTOR_NAME] [varchar](50) NULL,
[PRESCRIBING_DOCTOR_SURNAME] [varchar](50) NULL,
[MO_ACTIVE] [bit] NOT NULL,
CONSTRAINT [PK_MedicationOrders] PRIMARY KEY CLUSTERED
(
[EVS_MO_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO [dbo].[A]
VALUES
(5411409,'97941689', 'B01AB06','NADROPARINE 0.8ML','7092425','ANBC', '2015-12-15 20:58:06.2030000',
'2015-12-16 00:00:00.0000000', '', 'IV', 1, 'GEORGE','LAST', 1);
-- TABLE B
CREATE TABLE [dbo].[B](
[ID] [int] IDENTITY(1,1) NOT NULL,
[ARTIKELNUMMER] [varchar](50) NOT NULL,
[VLAG54] [bit] NULL,
[VLAG55] [bit] NULL CONSTRAINT [DF_Table_1_VLAG50] DEFAULT ((0)),
[VLAG100] [bit] NULL CONSTRAINT [DF_ArtikelVlaggen_VLAG100] DEFAULT ((0)),
CONSTRAINT [PK_B] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO [dbo].[B]
([ARTIKELNUMMER]
,[VLAG54]
,[VLAG55]
,[VLAG100])
VALUES
('97941689', 1,0,1);
-- TABLE C
CREATE TABLE [dbo].[C](
[ID] [int] IDENTITY(1,1) NOT NULL,
[OBX_DATETIME] [datetime2](7) NOT NULL,
[PID] [int] NOT NULL,
[DEPARTMENT] [varchar](8) NOT NULL,
[OBX_ID] [int] NOT NULL,
[OBX_VALUE] [decimal](5, 2) NOT NULL,
[OBX_UNITS] [varchar](10) NULL,
[REF_RANGE] [varchar](40) NULL,
[FLAG] [varchar](2) NULL,
CONSTRAINT [PK_C] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO [dbo].[C]
([OBX_DATETIME]
,[PID]
,[DEPARTMENT]
,[OBX_ID]
,[OBX_VALUE]
,[OBX_UNITS]
,[REF_RANGE]
,[FLAG])
VALUES
('2015-12-15 14:01:00.0000000',7092425, '8NAH', 20937, 27.00, 'mL/min', '> 60', 'L');
INSERT INTO [dbo].[C]
([OBX_DATETIME]
,[PID]
,[DEPARTMENT]
,[OBX_ID]
,[OBX_VALUE]
,[OBX_UNITS]
,[REF_RANGE]
,[FLAG])
VALUES
('2015-12-15 06:30:00.0000000',7092425, '6ZPA', 20937, 28.00, 'mL/min', '> 60', 'L');
This will order them by OBX_DATETIME and take only the first one:
...
LEFT JOIN (
SELECT pid, obx_id, obx_datetime, obx_value
, n = ROW_NUMBER() over(PARTITION BY pid ORDER BY obx_datetime desc)
FROM [rxmon].[dbo].[c]
) AS r3
ON r3.pid = r1.patient_id and r3.n = 1
...
If OBX_DATETIME are inserted incrementaly (newer date only), you can order by ID instead.
This SQL Fiddle with your query and sample data/tables returns 2 rows: http://sqlfiddle.com/#!3/df36c/2/0
This SQL Fiddle with the new subquery returns 1 row: http://sqlfiddle.com/#!3/df36c/1/0
You are using a LEFT JOIN on r3 but have also have r3 in your WHERE clause with equal operator:
WHERE r3.obx_id = 20937
AND Cast(r3.obx_datetime AS DATE) = Cast(Getdate() - 1 AS DATE)
It will remove NULL value from the left join on r3. Perhaps you should also move it to the sub query or use INNER JOIN.
You should also avoind using the DB name in your query unless this query is run from another DB on the same server. This will be fine:
SELECT ... FROM [dbo].[a] AS r1 ...
Using SELECT * is also a bad habit. You should list only the columns your code will use.
try this.... #Shift
SELECT r1.*,
r2.vlag54,
r2.vlag55
FROM [dbo].[a] AS r1
LEFT JOIN [dbo].[b] AS r2
ON r2.artikelnummer = r1.drug_id
LEFT JOIN (
SELECT
ROW_NUMBER() OVER (PARTITION BY pid ORDER BY id DESC) RN,
c.*
FROM C
) r3
ON r3.pid = r1.patient_id AND r3.RN = 1
WHERE r3.obx_id = 20937
AND Cast(r3.obx_datetime AS DATE) = Cast(Getdate() - 1 AS DATE)
AND r1.patient_id = 7092425
AND obx_value < CASE
WHEN r2.vlag54 = 1 THEN 30
WHEN r2.vlag55 = 1 THEN 50
END
AND r2.vlag54 = CASE
WHEN r3.obx_value < 30 THEN 1
ELSE 0
END
AND r2.vlag55 = CASE
WHEN r3.obx_value BETWEEN 30 AND 50 THEN 1
ELSE 0
END
ORDER BY obx_datetime DESC;

How to update a column via Row_Number with a different value for each row?

I have this table right now
CREATE TABLE [dbo].[DatosLegales](
[IdCliente] [int] NOT NULL,
[IdDatoLegal] [int] NULL,
[Nombre] [varchar](max) NULL,
[RFC] [varchar](13) NULL,
[CURP] [varchar](20) NULL,
[IMSS] [varchar](20) NULL,
[Calle] [varchar](100) NULL,
[Numero] [varchar](10) NULL,
[Colonia] [varchar](100) NULL,
[Pais] [varchar](50) NULL,
[Estado] [varchar](50) NULL,
[Ciudad] [varchar](50) NULL,
[CodigoPostal] [varchar](10) NULL,
[Telefono] [varchar](13) NULL,
[TipoEmpresa] [varchar](20) NULL,
[Tipo] [varchar](20) NULL,
CONSTRAINT [PK_DatosLegales] PRIMARY KEY CLUSTERED
(
[IdCliente] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
)
I need to update the IdDatoLegal Column. Right now I have 80 rows on that table, so I need to update each row with the numbers 1, 2, 3... 79, 80.
I have tried simple queries to stored procedures with no succeed at all.
I have this stores procedure right now:
ALTER PROCEDURE dbo.ActualizarDatosLegales
#RowCount int
AS
DECLARE #Inicio int
SET #Inicio = 0
WHILE #Inicio < ##RowCount
SET #Inicio += 1;
BEGIN
UPDATE DatosLegales SET IdDatoLegal = #Inicio WHERE (SELECT ROW_NUMBER() OVER (ORDER BY IdCliente) AS RowNum FROM DatosLegales) = #Inicio;
END
It returns this message when I run it
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
I guess that's because in the subquery (SELECT ROW_NUMBER() OVER (ORDER BY IdCliente) AS RowNum FROM DatosLegales) it returns 80 rows where it should only return one (but each time it should be a diferent number.
Do you know what do I have to add to the subquery to make it work? and above all, Is the loop and the rest of the procedure right?
thanks in advance
You can update all the rows in one statement using a CTE as below.
;WITH T
AS (SELECT IdDatoLegal,
Row_number() OVER (ORDER BY IdCliente ) AS RN
FROM dbo.DatosLegales)
UPDATE T
SET IdDatoLegal = RN
UPDATE D
SET IdDatoLegal = RN
FROM DatosLegales D JOIN
(
SELECT IdCliente, Row_number() OVER (ORDER BY IdCliente) AS RN
FROM DatosLegales
) Temp
ON D.IdCliente = Temp.IdCliente

How to determine size of continious range for given criteria?

I have a positions table in SQL Server 2008R2 (definition below).
In the system boxes there are positions.
I have a requirement to find a box, which has X free positions remaining. However, the X positions must be continuous (left to right, top to bottom i.e. ascending PositionID).
It has been simple to construct a query that finds a box with X positions free. I now have the problem of determining if the positions are continuous.
Any suggestions on a TSQL based solution?
Table Definition
` CREATE TABLE [dbo].[Position](
[PositionID] [int] IDENTITY(1,1) NOT NULL,
[BoxID] [int] NOT NULL,
[pRow] [int] NOT NULL,
[pColumn] [int] NOT NULL,
[pRowLetter] [char](1) NOT NULL,
[pColumnLetter] [char](1) NOT NULL,
[SampleID] [int] NULL,
[ChangeReason] [nvarchar](4000) NOT NULL,
[LastUserID] [int] NOT NULL,
[TTSID] [bigint] NULL,
CONSTRAINT [PK_Position] PRIMARY KEY CLUSTERED
(
[PositionID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]`
Edit
http://pastebin.com/V8DLiucN - pastebin link with sample positions for 1 box (all positions empty in sample data)
Edit 2
A 'free' position is one with SampleID = null
DECLARE #AvailableSlots INT
SET #AvailableSlots = 25
;WITH OrderedSet AS (
SELECT
BoxID,
PositionID,
Row_Number() OVER (PARTITION BY BoxID ORDER BY PositionID) AS rn
FROM
Position
WHERE
SampleID IS NULL
)
SELECT
BoxID,
COUNT(*) AS AvailableSlots,
MIN(PositionID) AS StartingPosition,
MAX(PositionID) AS EndingPosition
FROM
OrderedSet
GROUP BY
PositionID - rn,
BoxID
HAVING
COUNT(*) >= #AvailableSlots
The trick is the PositionID - rn (row number) in the GROUP BY statement. This works to group together continuous sets... and from there it's easy to just do a HAVING to limit the results to the BoxIDs that have the required amount of free slots.