Nearest Neighbor for Point in Time - sql
Say I have two tables (SQL Fiddle). One that has recorded values at various timestamps, the other indicates ID's and datetimes to sample for nearest values. Using something similar to Kevin Meade's NEAREST NEIGHBOR PREFERENCE LOW (but in SQL Server 2008), I want to find the (non-null value) of the indicated ID closest to the target (census) date but not after the (census) date. If there is a row that matches the census date, use that one (unless it has a null value). If there is no row that is before the census date then find the row that is closest to the census date but not before it and use that one.
First Table:
CREATE TABLE Recorded_Vent_Types
([PAT_ENC_CSN_ID] int, [RECORDED_TIME] datetime, [MEAS_VALUE] varchar(9));
INSERT INTO Recorded_Vent_Types
([PAT_ENC_CSN_ID], [RECORDED_TIME], [MEAS_VALUE])
VALUES
(11117777, '2013-06-08 19:36:00.000', 'SIMV/PRVC'),
(11117777, '2013-06-08 22:21:00.000', 'PRVC/AC'),
(11117777, '2013-06-09 00:10:00.000', NULL),
(11117777, '2013-06-09 03:00:00.000', 'SIMV/PRVC'),
(11117777, '2013-06-09 23:59:00.000', 'SIMV/PRVC'),
(11117777, '2013-06-10 00:00:00.000', 'NAVA'),
(11117777, '2013-06-10 00:20:00.000', 'PS'),
(11117777, '2013-06-10 00:25:00.000', NULL),
(555999, '2013-06-08 00:36:00.000', NULL),
(555999, '2013-06-08 22:21:00.000', 'PRVC/AC'),
(555999, '2013-06-09 00:10:00.000', 'SIMV/PRVC'),
(555999, '2013-06-11 23:15:00.000', 'BIVENT'),
(555999, '2013-06-12 00:00:00.000', NULL),
(555999, '2013-06-12 00:20:00.000', 'PS');
Second Table:
CREATE TABLE Census
([PAT_ENC_CSN_ID] int, [CENSUS_TIME] datetime);
INSERT INTO Census
([PAT_ENC_CSN_ID], [CENSUS_TIME])
VALUES
(11117777, '2013-06-08 00:00:00'),
(11117777, '2013-06-09 00:00:00'),
(11117777, '2013-06-10 00:00:00'),
(11117777, '2013-06-11 00:00:00'),
(555999, '2013-06-08 00:00:00'),
(555999, '2013-06-09 00:00:00'),
(555999, '2013-06-11 00:00:00'),
(555999, '2013-06-12 00:00:00');
Here's Mr Meade's Oracle Code for something similar within one table for given ID:
select *
from claim_history
where claim_id = 1
and status_date =
(
select min(status_date)
from (
select max(status_date) status_date
from claim_history
where claim_id = 1
and status_date <= sysdate-3
union all
select min(status_date)
from claim_history
where claim_id = 1
and status_date > sysdate-3
)
)
/
My Desired Result set:
PAT_ENC_CSN_ID CENSUS_TIME RECORDED_TIME MEAS_VALUE
555999 June, 08 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
555999 June, 09 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
555999 June, 11 2013 00:00:00+0000 June, 09 2013 00:10:00+0000 SIMV/PRVC
555999 June, 12 2013 00:00:00+0000 June, 11 2013 23:15:00+0000 BIVENT
11117777 June, 08 2013 00:00:00+0000 June, 08 2013 19:36:00+0000 SIMV/PRVC
11117777 June, 09 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
11117777 June, 10 2013 00:00:00+0000 June, 10 2013 00:00:00+0000 NAVA
11117777 June, 11 2013 00:00:00+0000 June, 10 2013 00:20:00+0000 PS
#Gordon Linoff gave me the idea to use absolute values of the date diff between census times and recorded times. This led me to modify #bobs solution here.
SELECT * FROM
(
SELECT rvt.PAT_ENC_CSN_ID, CENSUS_TIME, RECORDED_TIME, MEAS_VALUE, ABS(DATEDIFF(s, c.CENSUS_TIME, RECORDED_TIME)) diff,
ROW_NUMBER() OVER (PARTITION BY rvt.PAT_ENC_CSN_ID, c.CENSUS_TIME ORDER BY ABS(DATEDIFF(s, c.CENSUS_TIME, RECORDED_TIME))) AS SEQUENCE
FROM Recorded_Vent_Types rvt join Census c on rvt.PAT_ENC_CSN_ID=c.PAT_ENC_CSN_ID
WHERE MEAS_VALUE IS NOT NULL
) as m
WHERE SEQUENCE = 1
ORDER BY PAT_ENC_CSN_ID,CENSUS_TIME
;
But this returns the (absolute) closest recorded time, with no preference given to a recorded time prior to the census time.
Result:
PAT_ENC_CSN_ID CENSUS_TIME RECORDED_TIME MEAS_VALUE
555999 June, 08 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
555999 June, 09 2013 00:00:00+0000 June, 09 2013 00:10:00+0000 SIMV/PRVC
555999 June, 11 2013 00:00:00+0000 June, 11 2013 23:15:00+0000 BIVENT
555999 June, 12 2013 00:00:00+0000 June, 12 2013 00:20:00+0000 PS
11117777 June, 08 2013 00:00:00+0000 June, 08 2013 19:36:00+0000 SIMV/PRVC
11117777 June, 09 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
11117777 June, 10 2013 00:00:00+0000 June, 10 2013 00:00:00+0000 NAVA
11117777 June, 11 2013 00:00:00+0000 June, 10 2013 00:20:00+0000 PS
You can do this as a correlated subquery -- in both Oracle and SQL Server, because this is almost standard SQL except for the top 1.
Here is the query:
select *,
(select top 1 PAT_ENC_CSN_ID
from census c
where c.census_time <= rvt.recorded_time
order by (case when c.census_time <= rvt.recorded_time then 1 else 0
end) desc,
(case when c.census_time <= rvt.recorded_time then c.census_time
end) desc,
c.census_time asc
) as nearestVal
from Recorded_Vent_Types rvt
The subquery returns one row, based on the order by, which is key to the query. It has three parts.
The first puts all census times before the recorded time at the beginning. The second sorts these by census time in descending order, the third sorts the rest by ascending time. I would like to replace the last two with:
abs(c.census_time - rvt.recorded_time)
Because this is logically what it does. Alas that doesn't work, because abs() doesn't work on datetime. And then I'd have to use the datediff() function or a case statement, and it would start to look more complicated.
Related
Create list based on actual month and year
I have my code which should returns names of the months from now and the year for the next 12 months. e.g. whe have now September so the code should retuns list of months with year till the September 2023th. month_names = "January February March April May June July August September October November December".split() Year = '2022' month_now = datetime.date.today().month dict_of_dfs = {} for i in range(month_now,len(month_names)): df_name = month_names[i] print(Year,i+1,'01') This code returns only the months till the end of the year and I do not know how to change it. The output should look like that: 2022 10 01 2022 11 01 2022 12 01 2023 01 01 2023 02 01 2023 03 01 ... 2023 07 01 2023 08 01 2023 09 01
Check pd.date_range pd.date_range(start = Year + '-' + str(month_now+1) + '-01', periods=12, freq='MS')
Another solution with pd.date_range: pd.date_range(start=month_now.replace(day=1), periods=13, freq='MS')[1:]
Using Pendulum: import pendulum date_list = [pendulum.now().add(months=1).start_of("month").add(months=x).to_date_string() for x in range(12)] print(date_list) ['2022-10-01', '2022-11-01', '2022-12-01', '2023-01-01', '2023-02-01', '2023-03-01', '2023-04-01', '2023-05-01', '2023-06-01', '2023-07-01', '2023-08-01', '2023-09-01']
How to calculate median monthly from date of month table?
My dataset: Date Num_orders Mar 21 2019 69 Mar 22 2019 82 Mar 24 2019 312 Mar 25 2019 199 Mar 26 2019 2,629 Mar 27 2019 2,819 Mar 28 2019 3,123 Mar 29 2019 3,332 Mar 30 2019 1,863 Mar 31 2019 1,097 Apr 01 2019 1,578 Apr 02 2019 2,353 Apr 03 2019 2,768 Apr 04 2019 2,648 Apr 05 2019 3,192 Apr 06 2019 2,363 Apr 07 2019 1,578 Apr 08 2019 3,090 Apr 09 2019 3,814 Apr 10 2019 3,836 ... I need to calculate the monthly median number of orders from days of the same month: The desired results: Month Median_monthly Mar 2019 1,863 Apr 2019 2,768 May 2019 2,876 Jun 2019 ... ... I tried to use function date_trunc to extract month from the dataset then group by 'month' but it didn't work out. Thanks for your help, I use Google Bigquery (#standard) environment!
Probably you tried to use PERCENTILE_CONT which can not be used with GROUP BY: Try to use APPROX_QUANTILES(x, 100)[OFFSET(50)]. It should work with GROUP BY. SELECT APPROX_QUANTILES](Num_orders, 100)\[OFFSET(50)\] AS median FROM myTable GROUP BY Month Alternativele you can use PERCENTILE_CONT within subquery: SELECT DISTINCT Month, median FROM ( SELECT Month, PERCENTILE_CONT(Num_orders, 0.5) OVER(PARTITION BY Month) AS median FROM myTable )
This would often be done using DISTINCT: SELECT DISTINCT DATE_TRUNC(month, date), PERCENTILE_CONT(Num_orders, 0.5) OVER (PARTITION BY DATE_TRUNC(month, date) AS median FROM myTable; Note: There are two percentile functions, PERCENTILE_CONT() and PERCENTILE_DISC(). They have different results when there is a "tie" in the middle of the data.
SQL Grouping cube and pivot
I'm trying to do the following query where I obtain a table grouping by years, month and sites, and then I pivot this sites to columns: SELECT * FROM ( SELECT DECODE(GROUPING(TO_CHAR(TM.TIMESTAMP,'YYYY')) ,0, TO_CHAR(TM.TIMESTAMP,'YYYY') ,1, 'TOTAL') AS "YEAR", DECODE(GROUPING(TO_CHAR(TM.TIMESTAMP,'MM')) ,0, TO_CHAR(TM.TIMESTAMP,'MM') ,1, 'TOTAL') AS "MONTH", DECODE(GROUPING(TS.CODIGO5) ,0, TS.CODIGO5 ,1, 'TOTAL') AS BU, SUM(TM.KWHGEN) AS GEN FROM T_MEDIDAS_CO TM JOIN T_Sede TS ON TM.id_sede=TS.id_sede WHERE TO_CHAR(TM.TIMESTAMP,'YYYY') IN (2015,2014) AND TS.CODIGO5 IN ('FINSI', 'FINOC') GROUP BY CUBE (TO_CHAR(TM.TIMESTAMP,'YYYY'), TO_CHAR(TM.TIMESTAMP,'MM'), TS.CODIGO5) ORDER BY TO_CHAR(TM.TIMESTAMP,'YYYY') DESC, TO_CHAR(TM.TIMESTAMP,'MM') DESC, 3 ) PIVOT ( SUM(GEN) FOR BU IN ('FINCI' AS FINCI,'FINSI' AS FINSI, 'FINOC' AS FINOC, 'TOTAL' AS TOTAL) ) ORDER BY "YEAR" DESC, "MONTH" DESC to obtain this result YEAR MONTH FINCI FINOC TOTAL 2015 12 110376,17 109991,55 220367,72 2015 11 92032,56 97938,09 189970,65 2015 10 77668,67 79273,98 156942,65 2015 09 87079,46 91203,73 178283,19 2015 08 99992,38 100220,24 200212,62 2015 07 142430 133979,74 276409,74 2015 06 107006,73 104320,96 211327,69 2015 05 86264 90985,62 177249,62 2015 04 85838,41 87147,74 172986,15 2015 03 106178,39 106342,4 212520,79 2015 02 125007,65 122790,76 247798,41 2015 01 134934,67 135897,7 270832,37 2015 TOTAL 1254809,09 1260092,51 2514901,6 2014 12 121185,25 122014,9 243200,15 2014 11 94682,9 94221,47 188904,37 2014 10 87212,59 92222,92 179435,51 2014 09 97306,19 100701,93 198008,12 2014 08 97738,26 101901,88 199640,14 2014 07 113242,07 117496,84 230738,91 2014 06 98234,69 98092,2 196326,89 2014 05 91202,74 102214,94 193417,68 2014 04 88517,65 103756,83 192274,48 2014 03 107541,53 119236,48 226778,01 2014 02 127880,75 131451,38 259332,13 2014 01 141381,35 143836,44 285217,79 2014 TOTAL 1266125,97 1327148,21 2593274,18 TOTAL 12 231561,42 232006,45 463567,87 TOTAL 11 186715,46 192159,56 378875,02 TOTAL 10 164881,26 171496,9 336378,16 TOTAL 09 184385,65 191905,66 376291,31 TOTAL 08 197730,64 202122,12 399852,76 TOTAL 07 255672,07 251476,58 507148,65 TOTAL 06 205241,42 202413,16 407654,58 TOTAL 05 177466,74 193200,56 370667,3 TOTAL 04 174356,06 190904,57 365260,63 TOTAL 03 213719,92 225578,88 439298,8 TOTAL 02 252888,4 254242,14 507130,54 TOTAL 01 276316,02 279734,14 556050,16 TOTAL TOTAL 2520935,06 2587240,72 5108175,78 But, I don't need the TOTAL | MONTH rows, how can I fix it? Thanks a lot
How to change start date in a table to a pair of start date and end date using SQL
The title must be confusing, but the thing I am trying to do is very easy to understand with an example. I have a table like this: Code Date_ Ratio 73245 Jan 1 1975 12:00AM 10 73245 Apr 18 2006 12:00AM 4 73245 Dec 26 2007 12:00AM 10 73245 Jan 30 2009 12:00AM 4 73245 Apr 21 2011 12:00AM 2 Basically for each security it gives some ratio for it with a date when the ratio starts to be effective. This table will be much easier to use if instead of just having a start date, it has a pair of start date and end date, like the following: Code StartDate_ EndDate_ Ratio 73245 Jan 1 1975 12:00AM Apr 18 2006 12:00AM 10 73245 Apr 18 2006 12:00AM Dec 26 2007 12:00AM 4 73245 Dec 26 2007 12:00AM Jan 30 2009 12:00AM 10 73245 Jan 30 2009 12:00AM Apr 21 2011 12:00AM 4 73245 Apr 21 2011 12:00AM Dce 31 2049 12:00AM(or some random date in far future) 2 How do I transform the original table to the table I want using SQL statements? I have little experience with SQL and I could not figure how. Please help! Thanks!
In SQL Server 2012: SELECT code, date_ AS startDate, LEAD(date_) OVER (PARTITION BY code ORDER BY date_) AS endDate, ratio FROM mytable In SQL Server 2005 and 2008: WITH q AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY code ORDER BY date_) AS rn FROM mytable ) SELECT q1.code, q1.date_ AS startDate, q2.date_ AS endDate, q1.ratio FROM q q1 LEFT JOIN q q2 ON q2.code = q1.code AND q2.rn = q1.rn + 1
Maybe it would also be possible to use OUTER APPLY, something like: SELECT t1.Code, t1.Date_ AS StartDate_, ISNULL(t2.EndDate_, CAST('20491231' AS DATETIME)) AS EndDate_ FROM t1 AS t1o OUTER APPLY ( SELECT TOP 1 Date_ AS EndDate_ FROM t1 WHERE t1.Code = t1o.Code AND t1.Date_ > t1o.Date_ ORDER BY t1.Date_ ASC ) AS t2
How get dates on week day?
I wanted to ask about how to get date on weekday green is weekday red is sunday so when i input sql command it (like 27) when year 2012 it will show date 2012-07-2 until 2012-07-08
This query uses a single parameter #weekno as input and returns the 7 days in that week, taking Monday as the first day of week. The definition of WeekNo does not follow SQL Server's DatePart(Week) because that depends on ##Datefirst. This doesn't. The dateadd.. line is an expression that returns the first Monday of the year. I got it from here. The line above it just adds the weeks to it and 0-6 to create 7 days. To verify this is correct for any year, change CURRENT_TIMESTAMP in the query to a date, such as 20180708. FYI, 1-Jan-2018 is a Monday. declare #weekno int = 27; select (#weekno-1)*7+v.num+ dateadd(dd,(datediff(dd,0,dateadd(yy,datediff(yy,0,CURRENT_TIMESTAMP),6))/7)*7,0) from (values(0),(1),(2),(3),(4),(5),(6))v(num) order by num -- results July, 02 2012 00:00:00+0000 July, 03 2012 00:00:00+0000 July, 04 2012 00:00:00+0000 July, 05 2012 00:00:00+0000 July, 06 2012 00:00:00+0000 July, 07 2012 00:00:00+0000 July, 08 2012 00:00:00+0000