I have my code which should returns names of the months from now and the year for the next 12 months.
e.g. whe have now September so the code should retuns list of months with year till the September 2023th.
month_names = "January February March April May June July August September October November December".split()
Year = '2022'
month_now = datetime.date.today().month
dict_of_dfs = {}
for i in range(month_now,len(month_names)):
df_name = month_names[i]
print(Year,i+1,'01')
This code returns only the months till the end of the year and I do not know how to change it.
The output should look like that:
2022 10 01
2022 11 01
2022 12 01
2023 01 01
2023 02 01
2023 03 01
...
2023 07 01
2023 08 01
2023 09 01
Check pd.date_range
pd.date_range(start = Year + '-' + str(month_now+1) + '-01', periods=12, freq='MS')
Another solution with pd.date_range:
pd.date_range(start=month_now.replace(day=1), periods=13, freq='MS')[1:]
Using Pendulum:
import pendulum
date_list = [pendulum.now().add(months=1).start_of("month").add(months=x).to_date_string() for x in range(12)]
print(date_list)
['2022-10-01', '2022-11-01', '2022-12-01', '2023-01-01', '2023-02-01', '2023-03-01', '2023-04-01', '2023-05-01', '2023-06-01', '2023-07-01', '2023-08-01', '2023-09-01']
I have an issue as I am trying to select Date values stored in SQL server as String value with this format "Thu, 08 Jul 2021 06:08:20 -0700" and i need to select all table with newest date in first but I do not know how to convert this String into Date and sort it. Thanks in advance.
Table
|Thu, 08 Jul 2021 06:08:20 -0700|
|Fri, 09 Jul 2021 01:08:20 -0700|
|Sun, 11 Jul 2021 07:08:20 -0700|
output (Newest Date first)
|Sun, 11 Jul 2021 07:08:20 -0700|
|Fri, 09 Jul 2021 01:08:20 -0700|
|Thu, 08 Jul 2021 06:08:20 -0700|
Your date is just missing a valid timezone offset value so needs a ":" inserted so it's -07:00, you can do this with stuff and use substring to ignore the irrelevant day name. You don't state a specific database platform, for SQL Server you can then cast to a datetimeoffset, other databases have similar but slightly varied syntax. This assumes the strings are all formatted consistently of course.
declare #d varchar(30)='Thu, 08 Jul 2021 06:08:20 -0700'
select Cast(Stuff(Substring(#d,6,26),25,0,':') as datetimeoffset(0))
Result
2021-07-08 06:08:20 -07:00
My dataset:
Date Num_orders
Mar 21 2019 69
Mar 22 2019 82
Mar 24 2019 312
Mar 25 2019 199
Mar 26 2019 2,629
Mar 27 2019 2,819
Mar 28 2019 3,123
Mar 29 2019 3,332
Mar 30 2019 1,863
Mar 31 2019 1,097
Apr 01 2019 1,578
Apr 02 2019 2,353
Apr 03 2019 2,768
Apr 04 2019 2,648
Apr 05 2019 3,192
Apr 06 2019 2,363
Apr 07 2019 1,578
Apr 08 2019 3,090
Apr 09 2019 3,814
Apr 10 2019 3,836
...
I need to calculate the monthly median number of orders from days of the same month:
The desired results:
Month Median_monthly
Mar 2019 1,863
Apr 2019 2,768
May 2019 2,876
Jun 2019 ...
...
I tried to use function date_trunc to extract month from the dataset then group by 'month' but it didn't work out. Thanks for your help, I use Google Bigquery (#standard) environment!
Probably you tried to use PERCENTILE_CONT which can not be used with GROUP BY:
Try to use APPROX_QUANTILES(x, 100)[OFFSET(50)]. It should work with GROUP BY.
SELECT APPROX_QUANTILES](Num_orders, 100)\[OFFSET(50)\] AS median
FROM myTable
GROUP BY Month
Alternativele you can use PERCENTILE_CONT within subquery:
SELECT
DISTINCT Month, median
FROM (
SELECT
Month,
PERCENTILE_CONT(Num_orders, 0.5) OVER(PARTITION BY Month) AS median
FROM myTable
)
This would often be done using DISTINCT:
SELECT DISTINCT DATE_TRUNC(month, date),
PERCENTILE_CONT(Num_orders, 0.5) OVER (PARTITION BY DATE_TRUNC(month, date) AS median
FROM myTable;
Note: There are two percentile functions, PERCENTILE_CONT() and PERCENTILE_DISC(). They have different results when there is a "tie" in the middle of the data.
I have field that contains strings and has dates within.
e.g
Rate (20 Jan 2020 - 19 Feb 2020)
or Rate (6 Dec 2019 - 5 Jan 2020)
I need a Start Date and End Date out of the above strings in SQL.
I can get Start Date but End Date (after the -) is a problem
Its quite crude but it will get you answers:
select left('20 Jan 2020 - 19 Feb 2020',CHARINDEX('-','20 Jan 2020 - 19 Feb 2020')-1)
,right('20 Jan 2020 - 19 Feb 2020',len('20 Jan 2020 - 19 Feb 2020') -CHARINDEX('-','20 Jan 2020 - 19 Feb 2020')-1)
CHARINDEX() will give you the position of the character you desire, in this case the dash. From there you can use LEFT(), RIGHT(), and LEN() to get the pieces out of the string that you need.
This works with SQLServer
Say I have two tables (SQL Fiddle). One that has recorded values at various timestamps, the other indicates ID's and datetimes to sample for nearest values. Using something similar to Kevin Meade's NEAREST NEIGHBOR PREFERENCE LOW (but in SQL Server 2008), I want to find the (non-null value) of the indicated ID closest to the target (census) date but not after the (census) date. If there is a row that matches the census date, use that one (unless it has a null value). If there is no row that is before the census date then find the row that is closest to the census date but not before it and use that one.
First Table:
CREATE TABLE Recorded_Vent_Types
([PAT_ENC_CSN_ID] int, [RECORDED_TIME] datetime, [MEAS_VALUE] varchar(9));
INSERT INTO Recorded_Vent_Types
([PAT_ENC_CSN_ID], [RECORDED_TIME], [MEAS_VALUE])
VALUES
(11117777, '2013-06-08 19:36:00.000', 'SIMV/PRVC'),
(11117777, '2013-06-08 22:21:00.000', 'PRVC/AC'),
(11117777, '2013-06-09 00:10:00.000', NULL),
(11117777, '2013-06-09 03:00:00.000', 'SIMV/PRVC'),
(11117777, '2013-06-09 23:59:00.000', 'SIMV/PRVC'),
(11117777, '2013-06-10 00:00:00.000', 'NAVA'),
(11117777, '2013-06-10 00:20:00.000', 'PS'),
(11117777, '2013-06-10 00:25:00.000', NULL),
(555999, '2013-06-08 00:36:00.000', NULL),
(555999, '2013-06-08 22:21:00.000', 'PRVC/AC'),
(555999, '2013-06-09 00:10:00.000', 'SIMV/PRVC'),
(555999, '2013-06-11 23:15:00.000', 'BIVENT'),
(555999, '2013-06-12 00:00:00.000', NULL),
(555999, '2013-06-12 00:20:00.000', 'PS');
Second Table:
CREATE TABLE Census
([PAT_ENC_CSN_ID] int, [CENSUS_TIME] datetime);
INSERT INTO Census
([PAT_ENC_CSN_ID], [CENSUS_TIME])
VALUES
(11117777, '2013-06-08 00:00:00'),
(11117777, '2013-06-09 00:00:00'),
(11117777, '2013-06-10 00:00:00'),
(11117777, '2013-06-11 00:00:00'),
(555999, '2013-06-08 00:00:00'),
(555999, '2013-06-09 00:00:00'),
(555999, '2013-06-11 00:00:00'),
(555999, '2013-06-12 00:00:00');
Here's Mr Meade's Oracle Code for something similar within one table for given ID:
select *
from claim_history
where claim_id = 1
and status_date =
(
select min(status_date)
from (
select max(status_date) status_date
from claim_history
where claim_id = 1
and status_date <= sysdate-3
union all
select min(status_date)
from claim_history
where claim_id = 1
and status_date > sysdate-3
)
)
/
My Desired Result set:
PAT_ENC_CSN_ID CENSUS_TIME RECORDED_TIME MEAS_VALUE
555999 June, 08 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
555999 June, 09 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
555999 June, 11 2013 00:00:00+0000 June, 09 2013 00:10:00+0000 SIMV/PRVC
555999 June, 12 2013 00:00:00+0000 June, 11 2013 23:15:00+0000 BIVENT
11117777 June, 08 2013 00:00:00+0000 June, 08 2013 19:36:00+0000 SIMV/PRVC
11117777 June, 09 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
11117777 June, 10 2013 00:00:00+0000 June, 10 2013 00:00:00+0000 NAVA
11117777 June, 11 2013 00:00:00+0000 June, 10 2013 00:20:00+0000 PS
#Gordon Linoff gave me the idea to use absolute values of the date diff between census times and recorded times. This led me to modify #bobs solution here.
SELECT * FROM
(
SELECT rvt.PAT_ENC_CSN_ID, CENSUS_TIME, RECORDED_TIME, MEAS_VALUE, ABS(DATEDIFF(s, c.CENSUS_TIME, RECORDED_TIME)) diff,
ROW_NUMBER() OVER (PARTITION BY rvt.PAT_ENC_CSN_ID, c.CENSUS_TIME ORDER BY ABS(DATEDIFF(s, c.CENSUS_TIME, RECORDED_TIME))) AS SEQUENCE
FROM Recorded_Vent_Types rvt join Census c on rvt.PAT_ENC_CSN_ID=c.PAT_ENC_CSN_ID
WHERE MEAS_VALUE IS NOT NULL
) as m
WHERE SEQUENCE = 1
ORDER BY PAT_ENC_CSN_ID,CENSUS_TIME
;
But this returns the (absolute) closest recorded time, with no preference given to a recorded time prior to the census time.
Result:
PAT_ENC_CSN_ID CENSUS_TIME RECORDED_TIME MEAS_VALUE
555999 June, 08 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
555999 June, 09 2013 00:00:00+0000 June, 09 2013 00:10:00+0000 SIMV/PRVC
555999 June, 11 2013 00:00:00+0000 June, 11 2013 23:15:00+0000 BIVENT
555999 June, 12 2013 00:00:00+0000 June, 12 2013 00:20:00+0000 PS
11117777 June, 08 2013 00:00:00+0000 June, 08 2013 19:36:00+0000 SIMV/PRVC
11117777 June, 09 2013 00:00:00+0000 June, 08 2013 22:21:00+0000 PRVC/AC
11117777 June, 10 2013 00:00:00+0000 June, 10 2013 00:00:00+0000 NAVA
11117777 June, 11 2013 00:00:00+0000 June, 10 2013 00:20:00+0000 PS
You can do this as a correlated subquery -- in both Oracle and SQL Server, because this is almost standard SQL except for the top 1.
Here is the query:
select *,
(select top 1 PAT_ENC_CSN_ID
from census c
where c.census_time <= rvt.recorded_time
order by (case when c.census_time <= rvt.recorded_time then 1 else 0
end) desc,
(case when c.census_time <= rvt.recorded_time then c.census_time
end) desc,
c.census_time asc
) as nearestVal
from Recorded_Vent_Types rvt
The subquery returns one row, based on the order by, which is key to the query. It has three parts.
The first puts all census times before the recorded time at the beginning. The second sorts these by census time in descending order, the third sorts the rest by ascending time. I would like to replace the last two with:
abs(c.census_time - rvt.recorded_time)
Because this is logically what it does. Alas that doesn't work, because abs() doesn't work on datetime. And then I'd have to use the datediff() function or a case statement, and it would start to look more complicated.