pandas.date_range -- freq="WOM-3FRI", how to understand that offset alias? - pandas

I've been trying to learn pandas in a lab class. One part of our lab manual goes over generating time-based indices with the date_range function. The class's lab manual says
The freq parameter accepts a variety of string representations, referred to as offset aliases. See Table 1.3 for a sampling of some of the options. For a complete list of the options, see http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases.
I checked through the 'offset-alias' and 'anchored offsets' sections of the online documentation. Most of the entries in table 1.3 can be understood from those two sections.
However, the last entry of the table is "WOM-3FRI" The table says this corresponds to a frequency of every 3rd Friday of the month. I have no idea how to deduce that from the online documentation. It looks like "WOM" is being used as the alias and "3FRI" is being used as an anchor. But, "WOM" is not listed as an alias in the online documentation. So, I'm struggling to make sense of what's happening here.
One hypothesis I have is that this is some sort of operation.
The online documentation and my lab book have a couple examples where prepending a number in front of an alias increases the length of a the period by that number. So, '2' operates in a way so that '2M' creates a frequency of every 2 months. Similarly, '5' operates in a way so that '5Y' creates a frequency of every 5 years. Does 'O' somehow operate in a way that the offset alias 'XOY' gives the xth sub-period of period Y? For example, would "MOY-5" give the 5th month of the year? Would "DOY-7FRI" give the 7th Friday of the year?
Another hypothesis I have is that "MOA" is a new-fangled alias, and "3FRI" is an anchor for it. However, the documentation online does not list "MOA". I checked, and it was pandas 0.23.4 documentation. My lab machine is running version 0.23.4, and it can handle "WOM-3FRI" just fine. Have they just not updated the documentation yet?
Would anyone could clear up the method/theory behind creating "WOM-3FRI"?
Lab manual with Table 1.3: http://www.acme.byu.edu/wp-content/uploads/2018/10/Pandas4.pdf

I did a little more digging. It looks like "WOM" is just an undocumented offset alias. Source: https://github.com/pandas-dev/pandas/issues/2289#issuecomment-269616457

read pandas DateOffsets:
WeekOfMonth - 'WOM' - the x-th day of the y-th week of each month
And see example here exercise
Create a DateTimeIndex consisting of the third Thursday in each month for the years 2015 and 2016.
pd.date_range('2015-01-01', '2016-12-31', freq='WOM-3THU')

Related

Using Optaplanner for long trip planning of a fleet of vehicles in a Vehicle Routing Problem (VRP)

I am applying the VRP example of optaplanner with time windows and I get feasible solutions whenever I define time windows in a range of 24 hours (00:00 to 23:59). But I am needing:
Manage long trips, where I know that the duration between leaving the depot to the first visit, or durations between visits, will be more than 24 hours. So currently it does not give me workable solutions, because the TW format is in 24 hour format. It happens that when applying the scoring rule "arrivalAfterDueTime", always the "arrivalTime" is higher than the "dueTime", because the "dueTime" is in a range of (00:00 to 23:59) and the "arrivalTime" is the next day.
I have thought that I should take each TW of each Customer and add more TW to it, one for each day that is planned.
Example, if I am planning a trip for 3 days, then I would have 3 time windows in each Customer. Something like this: if Customer 1 is available from [08:00-10:00], then say it will also be available from [32:00-34:00] and [56:00-58:00] which are the equivalent of the same TW for the following days.
Likewise I handle the times with long, converted to milliseconds.
I don't know if this is the right way, my consultation would be more about some ideas to approach this constraint, maybe you have a similar problematic and any idea for me would be very appreciated.
Sorry for the wording, I am a Spanish speaker. Thank you.
Without having checked the example, handing multiple days shouldn't be complicated. It all depends on how you model your time variable.
For example, you could:
model the time stamps as a long value denoted as seconds since epoch. This is how most of the examples are model if I remember correctly. Note that this is not very human-readable, but is the fastest to compute with
you could use a time data type, e.g. LocalTime, this is a human-readable time format but will work in the 24-hour range and will be slower than using a primitive data type
you could use a date time data tpe, e.g LocalDateTime, this is also human-readable and will work in any time range and will also be slower than using a primitive data type.
I would strongly encourage to not simply map the current day or current hour to a zero value and start counting from there. So, in your example you denote the times as [32:00-34:00]. This makes it appear as you are using the current day midnight as the 0th hour and start counting from there. While you can do this it will affect debugging and maintainability of your code. That is just my general advice, you don't have to follow it.
What I would advise is to have your own domain models and map them to Optaplanner models where you use a long value for any time stamp that is denoted as seconds since epoch.

Optaplanner: how to handle minimum number consecutive

Let's assume a variation on Nurse Rostering example in which instead of assigning a nurse to a shift on a day, the nurse is assigned to a variable number of timeblocks on that day (which consists of 24 timeblocks). eg: Nurse1 is assigned to timeblocks [8,9,10,11,12,13,14]. Let's call these consecutive assignments a ShiftPeriod. There is a hard minimum and maximum on these shiftperiods. However, optaplanner has difficulties finding a feasible solution.
When having hard consecutive constraints, is it better to model the planning entity as a startTimeBlock with a duration instead of my current way with assignment to a timeblock and a day and then imposing min/max consecutive?
Take a look at the meeting scheduling example on github master for 6.4.0.Beta1 (but the example will work perfectly with 6.3.0.Final too). Video and docs coming soon. That example uses the design pattern TimeGrains, which is what you're looking for I think.

Google Bigquery table decorators

I need to add decorators that will represent from 6 days ago till now.
how should I do it?
lets say the date is realative 604800000 millis from now and it's absolute is 1427061600000
#-604800000
#1427061600000
#now in millis - 1427061600000
#1427061600000 - now in millis
Is there a difference by using relative or absolute times?
Thanks
#-518400000--1
Will give you data for the last 6 days (or last 144 hours).
I think all you need is to read this.
Basically, you have the choice of #time, which is time since Epoch (your #1427061600000). You can also express it as a negative number, which the system will interpret as NOW - time (your #-604800000). These both work, but they don't give the result you want. Instead of returning all that was added in that time range, it will return a snapshot of your table from 6 days ago....
Although you COULD use that snapshot, eliminate all duplicates between that snapshot and your current table, and then take THOSE results as what was added during your 6 days, you're better off with :
Using time ranges directly, which you cover with your 3rd and 4th lines. I don't know if the order makes a difference, but I've always used #time1-time2 with time1<time2 (in your case, #1427061600000 - now in millis).

Trying to get all records that have a timestamp in between two given dates

I have this huge database of records that have been created over the past 5 or so years. I'm thinking it would be cool (and edifying) to try to create some time categories/segments for these records, the unit could be week or month or something like that, something to use for a graph.
Anyway, I need to develop a query that, given a datetime attr for each record in the table, would return all the records with a datetime falling in between X and Y (June 1, 2011 & June 7, 2011, for example).
I'm not good at using the time helpers yet and could not find any sufficiently similar questions on SO or elsewhere.
Solutions that use subjective increments like "week" or "month" that rails can understand would be strongly appreciated. I know how tricky the calendar can get in programming. Or I could just use some lowest common denominator (day) and do an extremely fine graph.
Client.where(:created_at => X..Y)
Source: Ruby on Rails Guides

SOLR - Boost function (bf) to increase score of documents whose date is closest to NOW

I have a solr instance containing documents which have a 'startTime' field ranging from last month to a year from now. I'd like to add a boost query/function to boost the scores of documents whose startTime field is close to the current time.
So far I have seen a lot of examples which use rord to add boosts to documents whom are newer but I have never seen an example of something like this.
Can anyone tell me how to do it please?
Thanks
If you're on Solr 1.4+, then you have access to the "ms" function in function queries, and the standard, textbook approach to boosting by recency is:
recip(ms(NOW,startTime),3.16e-11,1,1)
ms gives the number of milliseconds between its two arguments. The expression as a whole boosts scores by 1 for docs dated now, by 1/2 for docs dated 1 year ago, by 1/3 for docs dated 2 years ago, etc.. (See http://wiki.apache.org/solr/FunctionQuery#Date_Boosting, as Sean Timm pointed out.)
In your case you have docs dated in the future, and those will get assigned a negative score by the above function, so you probably would want to throw in an absolute value, like this:
recip(abs(ms(NOW,startTime)),3.16e-11,1,1)
abs(ms(NOW,startTime)) will give the # of milliseconds between startTime and now, guaranteed to be nonnegative.
That would be a good starting place. If you want, you can then tweak the 3.16e-11 if it's too agressive or not agressive enough.
Tangentially, the ms function will only work on fields based on the TrieDate class, not the classic Date and LegacyDate classes. If your schema.xml was based on the example one for Solr 1.4, then your date field is probably already in the correct format.
You can do date math in Solr 1.4.
http://wiki.apache.org/solr/FunctionQuery#Date_Boosting