Nearest Neighbor for Point in Time - sql

Say I have two tables (SQL Fiddle). One that has recorded values at various timestamps, the other indicates ID's and datetimes to sample for nearest values. Using something similar to Kevin Meade's NEAREST NEIGHBOR PREFERENCE LOW (but in SQL Server 2008), I want to find the (non-null value) of the indicated ID closest to the target (census) date but not after the (census) date. If there is a row that matches the census date, use that one (unless it has a null value). If there is no row that is before the census date then find the row that is closest to the census date but not before it and use that one.
First Table:
CREATE TABLE Recorded_Vent_Types
([PAT_ENC_CSN_ID] int, [RECORDED_TIME] datetime, [MEAS_VALUE] varchar(9));
INSERT INTO Recorded_Vent_Types
([PAT_ENC_CSN_ID], [RECORDED_TIME], [MEAS_VALUE])
VALUES
(11117777, '2013-06-08 19:36:00.000', 'SIMV/PRVC'),
(11117777, '2013-06-08 22:21:00.000', 'PRVC/AC'),
(11117777, '2013-06-09 00:10:00.000', NULL),
(11117777, '2013-06-09 03:00:00.000', 'SIMV/PRVC'),
(11117777, '2013-06-09 23:59:00.000', 'SIMV/PRVC'),
(11117777, '2013-06-10 00:00:00.000', 'NAVA'),
(11117777, '2013-06-10 00:20:00.000', 'PS'),
(11117777, '2013-06-10 00:25:00.000', NULL),
(555999, '2013-06-08 00:36:00.000', NULL),
(555999, '2013-06-08 22:21:00.000', 'PRVC/AC'),
(555999, '2013-06-09 00:10:00.000', 'SIMV/PRVC'),
(555999, '2013-06-11 23:15:00.000', 'BIVENT'),
(555999, '2013-06-12 00:00:00.000', NULL),
(555999, '2013-06-12 00:20:00.000', 'PS');
Second Table:
CREATE TABLE Census
([PAT_ENC_CSN_ID] int, [CENSUS_TIME] datetime);
INSERT INTO Census
([PAT_ENC_CSN_ID], [CENSUS_TIME])
VALUES
(11117777, '2013-06-08 00:00:00'),
(11117777, '2013-06-09 00:00:00'),
(11117777, '2013-06-10 00:00:00'),
(11117777, '2013-06-11 00:00:00'),
(555999, '2013-06-08 00:00:00'),
(555999, '2013-06-09 00:00:00'),
(555999, '2013-06-11 00:00:00'),
(555999, '2013-06-12 00:00:00');
Here's Mr Meade's Oracle Code for something similar within one table for given ID:
select *
from claim_history
where claim_id = 1
and status_date =
(
select min(status_date)
from (
select max(status_date) status_date
from claim_history
where claim_id = 1
and status_date <= sysdate-3
union all
select min(status_date)
from claim_history
where claim_id = 1
and status_date > sysdate-3
)
)
/
My Desired Result set:
PAT_ENC_CSN_ID CENSUS_TIME RECORDED_TIME MEAS_VALUE
555999 June, 08 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
555999 June, 09 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
555999 June, 11 2013 00:00:00+0000 June, 09 2013 00:10:00+0000 SIMV/PRVC
555999 June, 12 2013 00:00:00+0000 June, 11 2013 23:15:00+0000 BIVENT
11117777 June, 08 2013 00:00:00+0000 June, 08 2013 19:36:00+0000 SIMV/PRVC
11117777 June, 09 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
11117777 June, 10 2013 00:00:00+0000 June, 10 2013 00:00:00+0000 NAVA
11117777 June, 11 2013 00:00:00+0000 June, 10 2013 00:20:00+0000 PS
#Gordon Linoff gave me the idea to use absolute values of the date diff between census times and recorded times. This led me to modify #bobs solution here.
SELECT * FROM
(
SELECT rvt.PAT_ENC_CSN_ID, CENSUS_TIME, RECORDED_TIME, MEAS_VALUE, ABS(DATEDIFF(s, c.CENSUS_TIME, RECORDED_TIME)) diff,
ROW_NUMBER() OVER (PARTITION BY rvt.PAT_ENC_CSN_ID, c.CENSUS_TIME ORDER BY ABS(DATEDIFF(s, c.CENSUS_TIME, RECORDED_TIME))) AS SEQUENCE
FROM Recorded_Vent_Types rvt join Census c on rvt.PAT_ENC_CSN_ID=c.PAT_ENC_CSN_ID
WHERE MEAS_VALUE IS NOT NULL
) as m
WHERE SEQUENCE = 1
ORDER BY PAT_ENC_CSN_ID,CENSUS_TIME
;
But this returns the (absolute) closest recorded time, with no preference given to a recorded time prior to the census time.
Result:
PAT_ENC_CSN_ID CENSUS_TIME RECORDED_TIME MEAS_VALUE
555999 June, 08 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
555999 June, 09 2013 00:00:00+0000 June, 09 2013 00:10:00+0000 SIMV/PRVC
555999 June, 11 2013 00:00:00+0000 June, 11 2013 23:15:00+0000 BIVENT
555999 June, 12 2013 00:00:00+0000 June, 12 2013 00:20:00+0000 PS
11117777 June, 08 2013 00:00:00+0000 June, 08 2013 19:36:00+0000 SIMV/PRVC
11117777 June, 09 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
11117777 June, 10 2013 00:00:00+0000 June, 10 2013 00:00:00+0000 NAVA
11117777 June, 11 2013 00:00:00+0000 June, 10 2013 00:20:00+0000 PS

You can do this as a correlated subquery -- in both Oracle and SQL Server, because this is almost standard SQL except for the top 1.
Here is the query:
select *,
(select top 1 PAT_ENC_CSN_ID
from census c
where c.census_time <= rvt.recorded_time
order by (case when c.census_time <= rvt.recorded_time then 1 else 0
end) desc,
(case when c.census_time <= rvt.recorded_time then c.census_time
end) desc,
c.census_time asc
) as nearestVal
from Recorded_Vent_Types rvt
The subquery returns one row, based on the order by, which is key to the query. It has three parts.
The first puts all census times before the recorded time at the beginning. The second sorts these by census time in descending order, the third sorts the rest by ascending time. I would like to replace the last two with:
abs(c.census_time - rvt.recorded_time)
Because this is logically what it does. Alas that doesn't work, because abs() doesn't work on datetime. And then I'd have to use the datediff() function or a case statement, and it would start to look more complicated.

Related

Create list based on actual month and year

I have my code which should returns names of the months from now and the year for the next 12 months.
e.g. whe have now September so the code should retuns list of months with year till the September 2023th.
month_names = "January February March April May June July August September October November December".split()
Year = '2022'
month_now = datetime.date.today().month
dict_of_dfs = {}
for i in range(month_now,len(month_names)):
df_name = month_names[i]
print(Year,i+1,'01')
This code returns only the months till the end of the year and I do not know how to change it.
The output should look like that:
2022 10 01
2022 11 01
2022 12 01
2023 01 01
2023 02 01
2023 03 01
...
2023 07 01
2023 08 01
2023 09 01
Check pd.date_range
pd.date_range(start = Year + '-' + str(month_now+1) + '-01', periods=12, freq='MS')
Another solution with pd.date_range:
pd.date_range(start=month_now.replace(day=1), periods=13, freq='MS')[1:]
Using Pendulum:
import pendulum
date_list = [pendulum.now().add(months=1).start_of("month").add(months=x).to_date_string() for x in range(12)]
print(date_list)
['2022-10-01', '2022-11-01', '2022-12-01', '2023-01-01', '2023-02-01', '2023-03-01', '2023-04-01', '2023-05-01', '2023-06-01', '2023-07-01', '2023-08-01', '2023-09-01']

How to calculate median monthly from date of month table?

My dataset:
Date Num_orders
Mar 21 2019 69
Mar 22 2019 82
Mar 24 2019 312
Mar 25 2019 199
Mar 26 2019 2,629
Mar 27 2019 2,819
Mar 28 2019 3,123
Mar 29 2019 3,332
Mar 30 2019 1,863
Mar 31 2019 1,097
Apr 01 2019 1,578
Apr 02 2019 2,353
Apr 03 2019 2,768
Apr 04 2019 2,648
Apr 05 2019 3,192
Apr 06 2019 2,363
Apr 07 2019 1,578
Apr 08 2019 3,090
Apr 09 2019 3,814
Apr 10 2019 3,836
...
I need to calculate the monthly median number of orders from days of the same month:
The desired results:
Month Median_monthly
Mar 2019 1,863
Apr 2019 2,768
May 2019 2,876
Jun 2019 ...
...
I tried to use function date_trunc to extract month from the dataset then group by 'month' but it didn't work out. Thanks for your help, I use Google Bigquery (#standard) environment!
Probably you tried to use PERCENTILE_CONT which can not be used with GROUP BY:
Try to use APPROX_QUANTILES(x, 100)[OFFSET(50)]. It should work with GROUP BY.
SELECT APPROX_QUANTILES](Num_orders, 100)\[OFFSET(50)\] AS median
FROM myTable
GROUP BY Month
Alternativele you can use PERCENTILE_CONT within subquery:
SELECT
DISTINCT Month, median
FROM (
SELECT
Month,
PERCENTILE_CONT(Num_orders, 0.5) OVER(PARTITION BY Month) AS median
FROM myTable
)
This would often be done using DISTINCT:
SELECT DISTINCT DATE_TRUNC(month, date),
PERCENTILE_CONT(Num_orders, 0.5) OVER (PARTITION BY DATE_TRUNC(month, date) AS median
FROM myTable;
Note: There are two percentile functions, PERCENTILE_CONT() and PERCENTILE_DISC(). They have different results when there is a "tie" in the middle of the data.

SQL Grouping cube and pivot

I'm trying to do the following query where I obtain a table grouping by years, month and sites, and then I pivot this sites to columns:
SELECT * FROM
(
SELECT
DECODE(GROUPING(TO_CHAR(TM.TIMESTAMP,'YYYY'))
,0, TO_CHAR(TM.TIMESTAMP,'YYYY')
,1, 'TOTAL') AS "YEAR",
DECODE(GROUPING(TO_CHAR(TM.TIMESTAMP,'MM'))
,0, TO_CHAR(TM.TIMESTAMP,'MM')
,1, 'TOTAL') AS "MONTH",
DECODE(GROUPING(TS.CODIGO5)
,0, TS.CODIGO5
,1, 'TOTAL') AS BU,
SUM(TM.KWHGEN) AS GEN
FROM T_MEDIDAS_CO TM
JOIN T_Sede TS ON TM.id_sede=TS.id_sede
WHERE TO_CHAR(TM.TIMESTAMP,'YYYY') IN (2015,2014)
AND TS.CODIGO5 IN ('FINSI', 'FINOC')
GROUP BY CUBE (TO_CHAR(TM.TIMESTAMP,'YYYY'), TO_CHAR(TM.TIMESTAMP,'MM'), TS.CODIGO5)
ORDER BY TO_CHAR(TM.TIMESTAMP,'YYYY') DESC, TO_CHAR(TM.TIMESTAMP,'MM') DESC, 3
)
PIVOT
(
SUM(GEN)
FOR BU IN ('FINCI' AS FINCI,'FINSI' AS FINSI, 'FINOC' AS FINOC, 'TOTAL' AS TOTAL)
)
ORDER BY "YEAR" DESC, "MONTH" DESC
to obtain this result
YEAR MONTH FINCI FINOC TOTAL
2015 12 110376,17 109991,55 220367,72
2015 11 92032,56 97938,09 189970,65
2015 10 77668,67 79273,98 156942,65
2015 09 87079,46 91203,73 178283,19
2015 08 99992,38 100220,24 200212,62
2015 07 142430 133979,74 276409,74
2015 06 107006,73 104320,96 211327,69
2015 05 86264 90985,62 177249,62
2015 04 85838,41 87147,74 172986,15
2015 03 106178,39 106342,4 212520,79
2015 02 125007,65 122790,76 247798,41
2015 01 134934,67 135897,7 270832,37
2015 TOTAL 1254809,09 1260092,51 2514901,6
2014 12 121185,25 122014,9 243200,15
2014 11 94682,9 94221,47 188904,37
2014 10 87212,59 92222,92 179435,51
2014 09 97306,19 100701,93 198008,12
2014 08 97738,26 101901,88 199640,14
2014 07 113242,07 117496,84 230738,91
2014 06 98234,69 98092,2 196326,89
2014 05 91202,74 102214,94 193417,68
2014 04 88517,65 103756,83 192274,48
2014 03 107541,53 119236,48 226778,01
2014 02 127880,75 131451,38 259332,13
2014 01 141381,35 143836,44 285217,79
2014 TOTAL 1266125,97 1327148,21 2593274,18
TOTAL 12 231561,42 232006,45 463567,87
TOTAL 11 186715,46 192159,56 378875,02
TOTAL 10 164881,26 171496,9 336378,16
TOTAL 09 184385,65 191905,66 376291,31
TOTAL 08 197730,64 202122,12 399852,76
TOTAL 07 255672,07 251476,58 507148,65
TOTAL 06 205241,42 202413,16 407654,58
TOTAL 05 177466,74 193200,56 370667,3
TOTAL 04 174356,06 190904,57 365260,63
TOTAL 03 213719,92 225578,88 439298,8
TOTAL 02 252888,4 254242,14 507130,54
TOTAL 01 276316,02 279734,14 556050,16
TOTAL TOTAL 2520935,06 2587240,72 5108175,78
But, I don't need the TOTAL | MONTH rows, how can I fix it?
Thanks a lot

How to change start date in a table to a pair of start date and end date using SQL

The title must be confusing, but the thing I am trying to do is very easy to understand with an example. I have a table like this:
Code Date_ Ratio
73245 Jan 1 1975 12:00AM 10
73245 Apr 18 2006 12:00AM 4
73245 Dec 26 2007 12:00AM 10
73245 Jan 30 2009 12:00AM 4
73245 Apr 21 2011 12:00AM 2
Basically for each security it gives some ratio for it with a date when the ratio starts to be effective. This table will be much easier to use if instead of just having a start date, it has a pair of start date and end date, like the following:
Code StartDate_ EndDate_ Ratio
73245 Jan 1 1975 12:00AM Apr 18 2006 12:00AM 10
73245 Apr 18 2006 12:00AM Dec 26 2007 12:00AM 4
73245 Dec 26 2007 12:00AM Jan 30 2009 12:00AM 10
73245 Jan 30 2009 12:00AM Apr 21 2011 12:00AM 4
73245 Apr 21 2011 12:00AM Dce 31 2049 12:00AM(or some random date in far future) 2
How do I transform the original table to the table I want using SQL statements? I have little experience with SQL and I could not figure how.
Please help! Thanks!
In SQL Server 2012:
SELECT code,
date_ AS startDate,
LEAD(date_) OVER (PARTITION BY code ORDER BY date_) AS endDate,
ratio
FROM mytable
In SQL Server 2005 and 2008:
WITH q AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY code ORDER BY date_) AS rn
FROM mytable
)
SELECT q1.code, q1.date_ AS startDate, q2.date_ AS endDate, q1.ratio
FROM q q1
LEFT JOIN
q q2
ON q2.code = q1.code
AND q2.rn = q1.rn + 1
Maybe it would also be possible to use OUTER APPLY, something like:
SELECT t1.Code, t1.Date_ AS StartDate_, ISNULL(t2.EndDate_, CAST('20491231' AS DATETIME)) AS EndDate_
FROM t1 AS t1o
OUTER APPLY
(
SELECT TOP 1 Date_ AS EndDate_
FROM t1
WHERE t1.Code = t1o.Code AND t1.Date_ > t1o.Date_
ORDER BY t1.Date_ ASC
) AS t2

How get dates on week day?

I wanted to ask about how to get date on weekday
green is weekday
red is sunday
so when i input sql command it (like 27) when year 2012
it will show date 2012-07-2 until 2012-07-08
This query uses a single parameter #weekno as input and returns the 7 days in that week, taking Monday as the first day of week. The definition of WeekNo does not follow SQL Server's DatePart(Week) because that depends on ##Datefirst. This doesn't.
The dateadd.. line is an expression that returns the first Monday of the year. I got it from here. The line above it just adds the weeks to it and 0-6 to create 7 days. To verify this is correct for any year, change CURRENT_TIMESTAMP in the query to a date, such as 20180708. FYI, 1-Jan-2018 is a Monday.
declare #weekno int = 27;
select
(#weekno-1)*7+v.num+
dateadd(dd,(datediff(dd,0,dateadd(yy,datediff(yy,0,CURRENT_TIMESTAMP),6))/7)*7,0)
from (values(0),(1),(2),(3),(4),(5),(6))v(num)
order by num
-- results
July, 02 2012 00:00:00+0000
July, 03 2012 00:00:00+0000
July, 04 2012 00:00:00+0000
July, 05 2012 00:00:00+0000
July, 06 2012 00:00:00+0000
July, 07 2012 00:00:00+0000
July, 08 2012 00:00:00+0000