How to group into 2 week increments by created_at [ruby] - sql

I have survey responses that I need to display in 2 week increments based on the created_at date. The output should be something like:
{10/1 : 4
10/15: 6
10/29: 3}
...where the first week is created from the earliest created_at date in the survey responses and the same for the last, but the latest created_at. I've seen things like group_by{ |s| s.created_at.month} but not something for every other week, starting on the Monday of the week. Any help would be much appreciated!

You could calculate the number of days between the current record and the oldest and then use modulo 14:
oldest_date = YourModel.minimum(:created_at).to_date
your_relation.group_by { |record|
(record.created_at.to_date - oldest_date).modulo(14)
}

You could define a method returning the year and the week range, for example:
def by_year_and_by_two_weeks(my_date)
wk = my_date.strftime("%W").to_i/2
wks = [wk, wk.odd? ? wk+1 : wk - 1].sort # <== adjust this
[my_date.year, wks]
end
The %W in rails uses monday as the first day of the week.
So, when you have your object:
object.created_at #=> 2021-09-19 08:58:16.78053 +0200
by_year_and_by_two_weeks(object.created_at) #=> [2021, [17, 18]]
Then you can use the method for grouping.
objects.group_by { |d| by_year_and_by_two_weeks(d.created_at) }
This is an example of result after values transformation to make it readables:
{[2021, [20, 21]]=>[2021-10-14 09:00:17.421142 +0200, 2021-10-15 09:00:17.421224 +0200, 2021-10-06 09:00:17.421276 +0200, 2021-10-10 09:00:17.421328 +0200], [2021, [18, 19]]=>[2021-09-22 09:00:17.421385 +0200]}
Of course you can change the by_year_and_by_two_weeks return value as it best fits for you.

Your requirements:
You want to group on the monday of the starting biweekly period.
You want the Hash key of the group to be the date of that monday.
I will also add another future-proofing requirement:
The start and end dates can be anywhere, even accross year boundaries.
If we take these requirements, and then utilize the modulo idea from spickermann, we can build it like this:
start_date = first_item.created_at.to_date.prev_occurring(:monday)
your_items.group_by { |item|
item_date = item.created_at.to_date
days_from_start = item_date - start_date
biweekly_offset = days_from_start.modulo(14)
biweekly_monday = item_date - biweekly_offset
biweekly_monday
}
Example:
test_dates = [
Date.new(2021, 10, 1),
Date.new(2021, 10, 6),
Date.new(2021, 10, 10),
Date.new(2021, 10, 13),
Date.new(2021, 10, 20),
Date.new(2021, 10, 31)
]
start = test_dates.first.prev_occurring(:monday)
pp test_dates.group_by { |date|
days_from_start = date - start
biweekly_offset = days_from_start.modulo(14)
biweekly_monday = date - biweekly_offset
biweekly_monday
}
Output:
{ Mon, 27 Sep 2021 => [Fri, 01 Oct 2021, Wed, 06 Oct 2021, Sun, 10 Oct 2021],
Mon, 11 Oct 2021 => [Wed, 13 Oct 2021, Wed, 20 Oct 2021],
Mon, 25 Oct 2021 => [Sun, 31 Oct 2021] }

Related

Add datetime to datetime

My birth year- 1975
Month- 08
Day- 28
Hour- 19
Minute- 10
Second- 22
I want to add 1.746387366 year to my(birth year, month, day, hour, minute, second) to get a new year month day hour minute second
How can I
Thanks.
You create a datetime of your birth moment:
Dim birthdate = new DateTime(1975, 8, 28, 19, 10, 22)
And you add that number of years to it:
Dim result = birthdate.AddYears(1.746387366)

How do you extract the date format "Month_name date, year" into separate columns of date, month and year in Pandas? For eg. "August 30, 2019"

I've seen extractions of date, month and year from data format: "DD-MM-YYYY" and the like. (Where the month is numbered rather than named)
However, I have a dataset which has date values in the format: "Month_name date, year".
Eg. "August 30, 2019".
Assume that your DataFrame contains TxtDate column, with
date strings:
TxtDate
0 August 30, 2019
1 May 12, 2020
2 February 16, 2020
The first step is to convert the source column to datetime type and save it
in a new column:
df['Date'] = pd.to_datetime(df.TxtDate)
This function is so "clever" that you can do even without explicit
format specification.
Then extract partilular date components (and save them in respective
columns):
df['Year'] = df.Date.dt.year
df['Month'] = df.Date.dt.month
df['Day'] = df.Date.dt.day
And the last step is to drop Date column (you didn't write
that you need the whole date):
df.drop(columns='Date', inplace=True)
The result is:
TxtDate Year Month Day
0 August 30, 2019 2019 8 30
1 May 12, 2020 2020 5 12
2 February 16, 2020 2020 2 16
Maybe you should also drop TxtDate column (your choice).

Sort months from january to december of months column in dataframe df [duplicate]

I have a Series object that has:
date price
dec 12
may 15
apr 13
..
Problem statement: I want to make it appear by month and compute the mean price for each month and present it with a sorted manner by month.
Desired Output:
month mean_price
Jan XXX
Feb XXX
Mar XXX
I thought of making a list and passing it in a sort function:
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
but the sort_values doesn't support that for series.
One big problem I have is that even though
df = df.sort_values(by='date',ascending=True,inplace=True) works
to the initial df but after I did a groupby, it didn't maintain the order coming out from the sorted df.
To conclude, I needed from the initial data frame these two columns. Sorted the datetime column and through a groupby using the month (dt.strftime('%B')) the sorting got messed up. Now I have to sort it by month name.
My code:
df # has 5 columns though I need the column 'date' and 'price'
df.sort_values(by='date',inplace=True) #at this part it is sorted according to date, great
total=(df.groupby(df['date'].dt.strftime('%B'))['price'].mean()) # Though now it is not as it was but instead the months appear alphabetically
You can use categorical data to enable proper sorting with pd.Categorical:
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
df['months'] = pd.Categorical(df['months'], categories=months, ordered=True)
df.sort_values(...) # same as you have now; can use inplace=True
When you specify the categories, pandas remembers the order of specification as the default sort order.
Docs: Pandas categories > sorting & order.
You should consider re-indexing it based on axis 0 (indexes)
new_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
df1 = df.reindex(new_order, axis=0)
Thanks #Brad Solomon for offering a faster way to capitalize string!
Note 1 #Brad Solomon's answer using pd.categorical should save your resources more than my answer. He showed how to assign order to your categorical data. You should not miss it :P
Alternatively, you can use.
df = pd.DataFrame([["dec", 12], ["jan", 40], ["mar", 11], ["aug", 21],
["aug", 11], ["jan", 11], ["jan", 1]],
columns=["Month", "Price"])
# Preprocessing: capitalize `jan`, `dec` to `Jan` and `Dec`
df["Month"] = df["Month"].str.capitalize()
# Now the dataset should look like
# Month Price
# -----------
# Dec XX
# Jan XX
# Apr XX
# make it a datetime so that we can sort it:
# use %b because the data use the abbreviation of month
df["Month"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.month
df = df.sort_values(by="Month")
total = (df.groupby(df['Month'])['Price'].mean())
# total
Month
1 17.333333
3 11.000000
8 16.000000
12 12.000000
Note 2
groupby by default will sort group keys for you. Be aware to use the same key to sort and groupby in the df = df.sort_values(by=SAME_KEY) and total = (df.groupby(df[SAME_KEY])['Price'].mean()). Otherwise, one may gets unintended behavior. See Groupby preserve order among groups? In which way? for more information.
Note 3
A more computationally efficient way is first compute mean and then do sorting on months. In this way, you only need to sort on 12 items rather than the whole df. It will reduce the computational cost if one don't need df to be sorted.
Note 4 For people already have month as index, and wonder how to make it categorical, take a look at pandas.CategoricalIndex #jezrael has a working example on making categorical index ordered in Pandas series sort by month index
I would use the calender module and reindex:
series.str.capitalize helps capitalizing the series , then we create a dictionary with the calender module and map with the series to get month number.
Once we have the month number we can sort_values() and get the index. Then reindex .
import calendar
df.date=df.date.str.capitalize() #capitalizes the series
d={i:e for e,i in enumerate(calendar.month_abbr)} #creates a dictionary
#d={i[:3]:e for e,i in enumerate(calendar.month_name)}
df.reindex(df.date.map(d).sort_values().index) #map + sort_values + reindex with index
date price
2 Apr 13
1 May 15
0 Dec 12
use Sort_Dataframeby_Month function to sort month names in chronological order
Packages need to install.
$ pip install sorted-months-weekdays
$ pip install sort-dataframeby-monthorweek
example:
from sorted_months_weekdays import *
from sort_dataframeby_monthorweek import *
df = pd.DataFrame([['Jan',23],['Jan',16],['Dec',35],['Apr',79],['Mar',53],['Mar',12],['Feb',3]], columns=['Month','Sum'])
df
Out[11]:
Month Sum
0 Jan 23
1 Jan 16
2 Dec 35
3 Apr 79
4 Mar 53
5 Mar 12
6 Feb 3
To sort dataframe by Month use below function
Sort_Dataframeby_Month(df=df,monthcolumnname='Month')
Out[14]:
Month Sum
0 Jan 23
1 Jan 16
2 Feb 3
3 Mar 53
4 Mar 12
5 Apr 79
6 Dec 35
You can add the numerical month value together with the name in the index (i.e "01 January"), do a sort then strip off the number:
total=(df.groupby(df['date'].dt.strftime('%m %B'))['price'].mean()).sort_index()
It may look sth like this:
01 January xxx
02 February yyy
03 March zzz
04 April ttt
total.index = [ x.split()[1] for x in total.index ]
January xxx
February yyy
March zzz
April ttt

SQL to Count status text and change some of those status texts (and their count) based on a date

Spent a while looking for this but did not quite find a solution. The problem: get counts for the status value in one column. Pretty simple so far, how ever I also want to change/ignore some of them when the DueDate (and it can be null) is past due based on current date and NOT complete. Also include empty strings for the status. Example Data (forgive the date show as text...assume it is a SQL date type - just wanted to make this more readable):
Current Date Apr : April 4, 2016
Data In DB Table DispatchStatus
Status Due Date
=======================================
Complete Mar 1, 2015 <-- would not be Late since Complete
Complete null
Complete July 12, 2016
Complete July 16, 2016
Started Mar 3, 2017
Started null
Started Feb 9, 2015 <-- Late
OnDevice June, 2016
OnDevice Dec 3, 2015 <-- Late
Dispatched Nov 16, 2015 <-- Late
Dispatched null
Dispatched Nov 20, 2016
Nov 15, 2017
null
Jan 15, 2016 <-- Late
The query should return:
Status Count
=========================
Complete 4
Started 2
OnDevice 1
2
Dispatched 2
Late 4
Thanks!
Use a CASE expression together with COUNT:
DECLARE #currentDate DATE = '20160404'
SELECT
Status =
CASE
WHEN DueDate <= #currentDate AND ISNULL(Status, '') <> 'Complete' THEN 'Late'
ELSE Status
END,
COUNT(*)
FROM DispatchStatus
GROUP BY
CASE
WHEN DueDate <= #currentDate AND ISNULL(Status, '') <> 'Complete' THEN 'Late'
ELSE Status
END
I think your sample current date should be Apr 4, 2016 based on your sample output.

When was this clock bought?

The clock on the gym wall also shows the day name and the day of the month. This morning it showed Tuesday - 23.
The day obviously rotates through a cycle of 7 - and showed "Tuesday" correctly. The day of the month, though, presumably rotates through a cycle of 31 and showed "23" incorrectly - today is the 1st December (ie. not the 31st November). So this error has been slowly accruing over time.
Given the assumption that no-one has ever reset the clock, what's the most elegant, quick, optimised way of suggesting some of the possible dates for when this clock was bought brand new.
(Bonus credit for showing when the clock will again show the correct day/number combination.)
01-Oct-17 is when the clock will again show the correct day/number combination.
The day of the week (i.e. Tuesday, ... etc) will always be correct, so it is irrelevant to your problem.
Assuming non leap year, you can build a table of 12 rows (1 per month) containing the number of days in this month minus 31.
Jan 0
Feb -3
Mar 0
Apr -1
May 0
Jun -1
Jul 0
Aug 0
Sep -1
Oct 0
Nov -1
Dec 0
You can build a table of the displayed date for every 1st of the month, by adding to the day of the previous month the related number in this list. If the number is negative or equal to zero, add 31 to the figure.
i.e. from the 1st Dec 09 (date at which the clock is displaying 23), you can go to the 1st Jan 10.
You look at this table and find the figure next to Dec, it is 0.
Add 0 to 23 and you know that on the 1st Jan 10, the clock will be displaying 23.
From the 1st Jan 09, you know that the date which will be displayed on the 1st Feb 10 is 23.
From the 1st Feb 10, you can compute the value for the 01 Mar 10, it is 23 + (-3) = 20.
... etc
So, now, at every start of month where you get a value of 1 in this table, you know that the dates in this month will be correct.
If you have to include leap year, you need a second table with the values for a leap year or make an exception for February.
If you want to use this computation for previous dates, substract the figure from the table and when the number you obtain is over 31, just substract 31 to get the day number.
Using these tables and taking in account leap years.
The last past date at which the clock was correct was the 30 September 08 (it was correct between the 01-Jul-08 and the 30-Sep-08)
The next date at which it will be correct will be the: 01-Oct-17 and it will still be correct on the 30-Nov-17.
Now = 1 Dec 2009.
1st day of the month minus 23rd of past month = 8 days (assuming 31 day month).
Moving back counting non-31-days month...
Nov, Sep, June, Apr, Feb (X3), Nov = 8 days offset
So it was bought before Nov 2008?
I didn't code a single line for it, so pardon me if the answer is way off.
In Excel, you can test any date in A2 to see whether the clock will be correct on that date, with the formula =MOD(A2+19,31)+1=DAY(A2)