Get phase of day based on solar altitude/azimuth - skyfield

As per the skyfield documentation online, I am able to calculate whether a given time of day is day or night.
import skyfield.api
import skyfield.almanac
ts = skyfield.api.load.timescale()
ephemeris = skyfield.api.load('de421.bsp')
observer = skyfield.api.Topos(latitude_degrees=LAT, longitude_degrees=LON)
is_day_or_night = skyfield.almanac.sunrise_sunset(ephemeris, observer)
day_or_night = is_day_or_night(ts.utc(merged_df.index.to_pydatetime()))
s = pd.Series(data=['Day' if is_day else 'Night' for is_day in day_or_night],
index = merged_df.index, dtype='category')
merged_df['Day or Night'] = s
Now, I want to also categorize the morning/noon/evening phases of the day according to solar altitude/azimuth. I came up with the following.
earth, sun = ephemeris['earth'], ephemeris['sun']
observer = earth + skyfield.api.Topos(latitude_degrees=LAT,
longitude_degrees=LON)
astrometric = observer.at(ts.utc(merged_df.index.to_pydatetime())).observe(sun)
alt, az, d = astrometric.apparent().altaz()
I need help in understanding how to proceed further since I don't have the related background knowledge about astronomical calculations.
Thanks

As per Brandon's comment, I used the cosine of zenith angle to get the sunlight intensity and then hacked together the sunlight intensity with the zenith angle to form a monotonically increasing function for the duration of the day.
temp = pd.DataFrame({'zenith': 90 - alt.degrees,
'Day or Night': day_or_night},
index=merged_df.index)
temp['cos zenith'] = np.cos(np.deg2rad(temp['zenith']))
temp['feature'] = temp['cos zenith'].diff(periods=-1)*temp['zenith']
temp['Day Phase'] = None
temp.loc[temp['Day or Night'] == False, 'Day Phase'] = 'Night'
temp.loc[(temp['feature'] > -np.inf) & (temp['feature'] <= -0.035), 'Day Phase'] = 'Morning'
temp.loc[(temp['feature'] > -0.035) & (temp['feature'] <= 0.035), 'Day Phase'] = 'Noon'
temp.loc[(temp['feature'] > 0.035) & (temp['feature'] <= np.inf), 'Day Phase'] = 'Evening'
merged_df['Phase of Day'] = temp['Day Phase']
The limits can be adjusted to change the duration required for noon, etc.

Related

pandas grooup by according to group of days of the week selected

I have this dataframe:
rng = pd.date_range(start='2018-01-01', end='2018-01-21')
rnd_values = np.random.rand(len(rng))+3
df = pd.DataFrame({'time':rng.to_list(),'value':rnd_values})
let's say that I want to group it according to the day of the week and compute the mean:
df['span'] = np.where((df['time'].dt.day_of_week <= 2 , 'Th-Sn', 'Mn-Wd')
df['wkno'] = df['time'].dt.isocalendar().week.shift(fill_value=0)
df.groupby(['wkno','span']).mean()
However, I would like to make this procedure more general.
Let's say that I define the following day is the week:
days=['Monday','Thursday']
Is there any option that allows me to do what I have done by using "days". I imagine that I have to compute the number of day between 'Monday','Thursday' and then I should use that number. What about the case when
days=['Monday','Thursday','Friday']
I was thinking to set-up a dictionary as:
days={'Monday':0,'Thursday':3,'Friday':4}
then
idays = list(days.values())[:]
How can I use now idays inside np.where? Indeed I have three interval.
Thanks
If you want to use more than one threshold you need np.searchsorted the resulting function would look something like
def groupby_daysspan_week(dfc,days):
df = dfc.copy()
day_to_dayofweek = {'Monday':0,'Tuesday':1,'Wednesday':2,
'Thursday':3,'Friday':4,'Saturday':5,'Sunday':6}
short_dict = {0:'Mn',1:'Tu',2:'Wd',3:'Th',4:'Fr',5:'St',6:'Sn'}
day_split = [day_to_dayofweek[d] for d in days]
df['wkno'] = df['time'].dt.isocalendar().week
df['dow'] = df['time'].dt.day_of_week
df['span'] = np.searchsorted(day_split,df['dow'],side='right')
span_name_dict = {i+1:short_dict[day_split[i]]+'-'+short_dict[(day_split+[6])[i+1]]
for i in range(len(day_split))}
df_agg = df.groupby(['wkno','span'])['value'].mean()
df_agg = df_agg.rename(index=span_name_dict,level=1)
return df_agg

Efficient way to expand a DataFrame in Julia

I have a dataframe with exposure episodes per case:
using DataFrames
using Dates
df = DataFrame(id = [1,1,2,3], startdate = [Date(2018,3,1),Date(2019,4,2),Date(2018,6,4),Date(2018,5,1)], enddate = [Date(2019,4,4),Date(2019,8,5),Date(2019,3,1),Date(2019,4,15)])
I want to expand each episode to its constituent days, eliminating any duplicate days per case resulting from overlapping episodes (case 1 in the example dataframe):
s = similar(df, 0)
for row in eachrow(df)
tf = DataFrame(row)
ttf = repeat(tf, Dates.value.(row.enddate - row.startdate) + 1)
ttf.daydate = ttf.startdate .+ Dates.Day.(0:nrow(ttf) - 1) #a record for each day between start and end days (inclusive)
ttf.start = ttf.daydate .== ttf.startdate #a flag to indicate this record was at the start of an episode
ttf.end = ttf.daydate .== ttf.enddate #a flag to indicate this record was at the end of an episode
append!(s, ttf, cols=:union)
end
sort!(s, [:id,:daydate,:startdate, order(:enddate, rev=true)])
unique!(s,[:id,:daydate]) #to eliminate duplicate dates in the case of episode overlaps (e.g. case 1)
I have a strong suspicion that there is a more efficient way of doing this than the brute force method I came up with and any help will be appreciated.
Implementation note: In the actual implementation there are several hundred thousand cases, each with relatively few episodes (median = 1, 75 percentile 3, maximum 20), but spanning 20 years or more of exposure resulting in a very large dataset (several 100 million records). To fit into available memory I have partitioned the dataset on id and used the Threads.#threads macro to loop through the partitions in parallel. The primary purpose of this decomposition into days is not just to eliminate overlaps, but to join the data with other exposure data that is available on a per day basis.
Below is a more complete solution that takes into account some essential details. Each episode is associated with additional attributes, as an example I used locationid (place where the exposure took place) and the need to indicate whether there was a gap between subsequent episodes. The original solution also did not cater for the special case where an episode is fully contained within another episode - such episodes should not be expanded.
using Dates
using DataFrames
function process(startdate, enddate, locationid)
start = startdate[1]
stop = enddate[1]
location = locationid[1]
res_daydate = collect(start:Day(1):stop)
res_startdate = fill(start, length(res_daydate))
res_enddate = fill(stop, length(res_daydate))
res_location = fill(location, length(res_daydate))
gap = 0
res_gap = fill(0, length(res_daydate))
for i in 2:length(startdate)
if startdate[i] > res_daydate[end]
start = startdate[i]
elseif enddate[i] > res_daydate[end]
start = res_daydate[end] + Day(1)
else
continue #this episode is contained within the previous episode
end
if start - res_daydate[end] > Day(1)
gap = gap==0 ? 1 : 0
end
stop = enddate[i]
location = locationid[i]
new_daydate = start:Day(1):stop
append!(res_daydate, new_daydate)
append!(res_startdate, fill(startdate[i], length(new_daydate)))
append!(res_enddate, fill(stop, length(new_daydate)))
append!(res_location, fill(location, length(new_daydate)))
append!(res_gap, fill(gap, length(new_daydate)))
end
return (daydate=res_daydate, startdate=res_startdate, enddate=res_enddate, locationid=res_location, gap = res_gap)
end
function eliminateoverlap()
df = DataFrame(id = [1,1,2,3,3,4,4], startdate = [Date(2018,3,1),Date(2019,4,2),Date(2018,6,4),Date(2018,5,1), Date(2019,5,1), Date(2012,1,1), Date(2012,2,2)],
enddate = [Date(2019,4,4),Date(2019,8,5),Date(2019,3,1),Date(2019,4,15),Date(2019,6,15),Date(2012,6,30), Date(2012,2,10)], locationid=[10,11,21,30,30,40,41])
dfs = sort(df, [:startdate, order(:enddate, rev=true)])
gdf = groupby(dfs, :id, sort=true)
r = combine(gdf, [:startdate, :enddate, :locationid] => process => AsTable)
df = combine(groupby(r, [:id,:gap,:locationid]), :daydate => minimum => :StartDate, :daydate => maximum => :EndDate)
return df
end
df = eliminateoverlap()
Here is something that should be efficient:
dfs = sort(df, [:startdate, order(:enddate, rev=true)])
gdf = groupby(dfs, :id, sort=true)
function process(startdate, enddate)
start = startdate[1]
stop = enddate[1]
res_daydate = collect(start:Day(1):stop)
res_startdate = fill(start, length(res_daydate))
res_enddate = fill(stop, length(res_daydate))
for i in 2:length(startdate)
if startdate[i] > res_daydate[end]
start = startdate[i]
stop = enddate[i]
elseif enddate[i] > res_daydate[end]
start = res_daydate[end] + Day(1)
stop = enddate[i]
end
new_daydate = start:Day(1):stop
append!(res_daydate, new_daydate)
append!(res_startdate, fill(startdate[i], length(new_daydate)))
append!(res_enddate, fill(stop, length(new_daydate)))
end
return (startdate=res_startdate, enddate=res_enddate, daydate=res_daydate)
end
combine(gdf, [:startdate, :enddate] => process => AsTable)
(but please check it with larger data against your implementation if it is correct as I have just written it quickly to show you how to do performant implementations with DataFrames.jl)

Data handling on multiple Heart rate files

I have been collecting the Heart rates of 12 calves who each received an anesthetic through four different routes of administration. I now have 48 txt files of this format:
Time HRbpm
0:00:01.7 97
0:00:02.3 121
0:00:02.8 15
... ...
HR was recorded for around 2hours. The Time column was dependent of the monitor, resulting in inconsistent time intervals between two measures.
The txt files are named as follows: 6133_IM_27.00.txt
With 6133 being the ID, IM the route and 27.00 the time (min:min.s:s) at which the treatment was injected.
My first goal is to have all the HR data so I can do an outlier analysis.
Then, I would like to include all this data in a single data frame that would look like this:
data.frame(ID=c(6133,6133,6133,6133,"...",6134,6134,"..."),
Route = c("IM","IM","IM","IM","...","SC","SC","..."),
time=c(0, 10, 20, 30,"...",0,10,"..."),
HR=c(160, 150, 145, 130,"...",162,158,"..."))
Time column going from 0 to 120 in 10min increments.
Each HR of this df would represent the mean of the HR values for the preceding minute for a given time (e.g. for time = 30, HR would represent the mean between 29 and 30 minutes for a given ID/Route combination).
I'm fairly new to R, so I've been having trouble just knowing by what angle starting on that problem. Any help would be welcome.
Thanks,
Thomas
For anyone who stumbles on this post, here's what I've done, seems to be working.
library(plyr)
library(reshape)
library(ggplot2)
setwd("/directory")
filelist = list.files(pattern = ".*.txt")
datalist = lapply(filelist, read.delim)
for (i in 1:length(datalist))
{datalist[[i]][3] = filelist[i]}
df = do.call("rbind", datalist)
attach(df)
out_lowHR = quantile(HRbpm,0.25)-1.5*IQR(HRbpm)
out_highHR = quantile(HRbpm,0.75)+1.5*IQR(HRbpm) #outliers thresholds: 60 and 200
dfc = subset(df,HRbpm>=60 & HRbpm<=200)
(length(df$HRbpm)-length(dfc$HRbpm))/length(df$HRbpm)*100 #8.6% of values excluded
df = dfc
df$ID = substr(df$V3,4,7)
df$ROA = substr(df$V3,9,11)
df$ti = substr(df$V3,13,17)
df$Time = as.POSIXct(as.character(df$Time), format="%H:%M:%S")
df$ti = as.POSIXct(as.character(df$ti), format="%M.%S")
df$t = as.numeric(df$Time-df$ti)
m=60
meanHR = ddply(df, c("ROA","ID"), summarise,
mean0 = mean(HRbpm[t>-60*m & t <=0]),
mean10 = mean(HRbpm[t>9*m & t <=10*m]),
mean20 = mean(HRbpm[t>19*m & t <=20*m]),
mean30 = mean(HRbpm[t>29*m & t <=30*m]),
mean45 = mean(HRbpm[t>44*m & t <=45*m]),
mean60 = mean(HRbpm[t>59*m & t <=60*m]),
mean90 = mean(HRbpm[t>89*m & t <=90*m]),
mean120 = mean(HRbpm[t>119*m & t <=120*m]))
meanHR = melt(meanHR)
meanHR$time = as.numeric(gsub("mean", "", meanHR$variable))
ggplot(meanHR, aes(x = time, y = value, col = ROA))+
geom_smooth()+
theme_classic()

How to adjust the cumulative period and weigh in python code(effective drought index(EDI:BYUN, WILHITE 1999)

I'm make a code, but big problem,, It takes too long to calculate.
In short, the definition of this EDI(Byun and wilhite 1999)
EDI is accumulate and weighting precipitation index
pr = precipitation, accumulate days : 365 days(1 years) after accumulate days is redifined.
Big problem is redifined code..
my code is belows
Main code: return_redifine_EP_MEP_DS
Sub code : return_EP_v2, return_MEP_v2 , return_DEP_v2
Additional explanation
im using the python iris and numpy etc.,
EP_cube.data or MEP_cube.data . ".data" is arrays
Data sets. CMIP5 models precipitation(time, latitude, longitude)
origin_pr is precipitation and shape (time, latitude, longitude): 1971~2000
EP_cube is accumulate precipitation and (time, latitude, longitude): 1972~2000
MEP_cube is clim mean(each calendar day)of EP_cube(time, latitude, longitude): 365 days
DEP_cube is EP_cube minus clim mean EP_cube(MEP) , (time, latitude, longitud): 1972~2000
sy,ey is climatology years
day is each models 1years day(example: HadGEM2-AO is 360days, ACCESS1-0: 365 days)
def return_redifine_EP_MEP_DS(origin_pr,EP_cube,DEP_cube,sy,ey,day):
origin_DS = day
origin_day= day
DS_cube = EP_cube - EP_cube.data + day
DS = origin_DS
o_DEP_cube =DEP_cube.copy()
return_time = 0
for j in range(0,DEP_cube.data.shape[1]):
for k in range(0,DEP_cube.data.shape[2]):
for i in range(1,DEP_cube.data.shape[0]):
if DEP_cube.data[i,j,k] <0 and DEP_cube.data[i-1,j,k] < 0:
DS = DS + 1
else:
DS = origin_DS
if DS != origin_DS:
EP_cube.data[i,j,k] = return_EP_v2(origin_pr[i-DS+origin_day:i+origin_day,j,k], DS)
day_of_year = DEP_cube[i,j,k].coord('day_of_year').points
MEP_cube.data[day_of_year-1,j,k] = return_MEP_v2(EP_cube[:,j,k], sy, ey,day_of_year)
DEP_cube.data[i,j,k] = return_DEP_v2(EP_cube[i,j,k], MEP_cube[day_of_year-1,j,k])
return EP_cube, MEP_cube, DEP_cube, DS_cube
and sub code below
def return_EP_v2(origin_pr,DS):
weights = np.arange(DS,0,-1) ## 1/1, 1/2, 1/3 ....
EP_data = np.sum(origin_pr.data / weights)
return EP_data
def return_MEP_v2(EP_cube,sy,ey,day_of_year):
### Below mean is extract years and want days(julian day)
ex_EP_cube = EP_cube.extract(iris.Constraint(year = lambda t: sy<=t<=ey, day_of_year = day_of_year ))
### Below mean is clim mean of EP_cube
MEP_data = ex_EP_cube.collapsed('day_of_year',iris.analysis.MEAN).data
return MEP_data
def return_DEP_v2(EP_cube, MEP_cube):
DEP_data = EP_cube.data - MEP_cube.data
return DEP_data

matplotlib x-axis ticks dates formatting and locations

I've tried to duplicate plotted graphs originally created with flotr2 for pdf output with matplotlib. I must say that flotr is way easyer to use... but that aside - im currently stuck at trying to format the dates /times on x-axis to desired format, which is hours:minutes with interval of every 2 hours, if period on x-axis is less than one day and year-month-day format if period is longer than 1 day with interval of one day.
I've read through numerous examples and tried to copy them, but outcome remains the same which is hours:minutes:seconds with 1 to 3 hour interval based on how long is the period.
My code:
colorMap = {
'speed': '#3388ff',
'fuel': '#ffaa33',
'din1': '#3bb200',
'din2': '#ff3333',
'satellites': '#bfbfff'
}
otherColors = ['#00A8F0','#C0D800','#CB4B4B','#4DA74D','#9440ED','#800080','#737CA1','#E4317F','#7D0541','#4EE2EC','#6698FF','#437C17','#7FE817','#FBB117']
plotMap = {}
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import matplotlib.dates as dates
fig = plt.figure(figsize=(22, 5), dpi = 300, edgecolor='k')
ax1 = fig.add_subplot(111)
realdata = data['data']
keys = realdata.keys()
if 'speed' in keys:
speed_index = keys.index('speed')
keys.pop(speed_index)
keys.insert(0, 'speed')
i = 0
for key in keys:
if key not in colorMap.keys():
color = otherColors[i]
otherColors.pop(i)
colorMap[key] = color
i += 1
label = u'%s' % realdata[keys[0]]['name']
ax1.set_ylabel(label)
plotMap[keys[0]] = {}
plotMap[keys[0]]['label'] = label
first_dates = [ r[0] for r in realdata[keys[0]]['data']]
date_range = first_dates[-1] - first_dates[0]
ax1.xaxis.reset_ticks()
if date_range > datetime.timedelta(days = 1):
ax1.xaxis.set_major_locator(dates.WeekdayLocator(byweekday = 1, interval=1))
ax1.xaxis.set_major_formatter(dates.DateFormatter('%Y-%m-%d'))
else:
ax1.xaxis.set_major_locator(dates.HourLocator(byhour=range(24), interval=2))
ax1.xaxis.set_major_formatter(dates.DateFormatter('%H:%M'))
ax1.xaxis.grid(True)
plotMap[keys[0]]['plot'] = ax1.plot_date(
dates.date2num(first_dates),
[r[1] for r in realdata[keys[0]]['data']], colorMap[keys[0]], xdate=True)
if len(keys) > 1:
first = True
for key in keys[1:]:
if first:
ax2 = ax1.twinx()
ax2.set_ylabel(u'%s' % realdata[key]['name'])
first = False
plotMap[key] = {}
plotMap[key]['label'] = u'%s' % realdata[key]['name']
plotMap[key]['plot'] = ax2.plot_date(
dates.date2num([ r[0] for r in realdata[key]['data']]),
[r[1] for r in realdata[key]['data']], colorMap[key], xdate=True)
plt.legend([value['plot'] for key, value in plotMap.iteritems()], [value['label'] for key, value in plotMap.iteritems()], loc = 2)
plt.savefig(path +"node.png", dpi = 300, bbox_inches='tight')
could someone point out why im not getting desired results, please?
Edit1:
I moved the formatting block after the plotting and seem to be getting better results now. They are still now desired results though. If period is less than day then i get ticks after every 2 hours (interval=2), but i wish i could get those ticks at even hours not uneven hours. Is that possible?
if date_range > datetime.timedelta(days = 1):
xax.set_major_locator(dates.DayLocator(bymonthday=range(1,32), interval=1))
xax.set_major_formatter(dates.DateFormatter('%Y-%m-%d'))
else:
xax.set_major_locator(dates.HourLocator(byhour=range(24), interval=2))
xax.set_major_formatter(dates.DateFormatter('%H:%M'))
Edit2:
This seemed to give me what i wanted:
if date_range > datetime.timedelta(days = 1):
xax.set_major_locator(dates.DayLocator(bymonthday=range(1,32), interval=1))
xax.set_major_formatter(dates.DateFormatter('%Y-%m-%d'))
else:
xax.set_major_locator(dates.HourLocator(byhour=range(0,24,2)))
xax.set_major_formatter(dates.DateFormatter('%H:%M'))
Alan
You are making this way harder on your self than you need to. matplotlib can directly plot against datetime objects. I suspect your problem is you are setting up the locators, then plotting, and the plotting is replacing your locators/formatters with the default auto versions. Try moving that block of logic about the locators to below the plotting loop.
I think that this could replace a fair chunk of your code:
d = datetime.timedelta(minutes=2)
now = datetime.datetime.now()
times = [now + d * j for j in range(500)]
ax = plt.gca() # get the current axes
ax.plot(times, range(500))
xax = ax.get_xaxis() # get the x-axis
adf = xax.get_major_formatter() # the the auto-formatter
adf.scaled[1./24] = '%H:%M' # set the < 1d scale to H:M
adf.scaled[1.0] = '%Y-%m-%d' # set the > 1d < 1m scale to Y-m-d
adf.scaled[30.] = '%Y-%m' # set the > 1m < 1Y scale to Y-m
adf.scaled[365.] = '%Y' # set the > 1y scale to Y
plt.draw()
doc for AutoDateFormatter
I achieved what i wanted by doing this:
if date_range > datetime.timedelta(days = 1):
xax.set_major_locator(dates.DayLocator(bymonthday=range(1,32), interval=1))
xax.set_major_formatter(dates.DateFormatter('%Y-%m-%d'))
else:
xax.set_major_locator(dates.HourLocator(byhour=range(0,24,2)))
xax.set_major_formatter(dates.DateFormatter('%H:%M'))