How do you extract the date format "Month_name date, year" into separate columns of date, month and year in Pandas? For eg. "August 30, 2019" - pandas

I've seen extractions of date, month and year from data format: "DD-MM-YYYY" and the like. (Where the month is numbered rather than named)
However, I have a dataset which has date values in the format: "Month_name date, year".
Eg. "August 30, 2019".

Assume that your DataFrame contains TxtDate column, with
date strings:
TxtDate
0 August 30, 2019
1 May 12, 2020
2 February 16, 2020
The first step is to convert the source column to datetime type and save it
in a new column:
df['Date'] = pd.to_datetime(df.TxtDate)
This function is so "clever" that you can do even without explicit
format specification.
Then extract partilular date components (and save them in respective
columns):
df['Year'] = df.Date.dt.year
df['Month'] = df.Date.dt.month
df['Day'] = df.Date.dt.day
And the last step is to drop Date column (you didn't write
that you need the whole date):
df.drop(columns='Date', inplace=True)
The result is:
TxtDate Year Month Day
0 August 30, 2019 2019 8 30
1 May 12, 2020 2020 5 12
2 February 16, 2020 2020 2 16
Maybe you should also drop TxtDate column (your choice).

Related

Combine Metric and Dimension In Formula (Datastudio)

I am trying to figure out how to create a column where...
clicks * (CASE
When Date <= "Jan 1, 2020" then 5
when Date >= "Jan 2, 2020" then 10
end)
But the error I am getting is this:
Sorry, calculated fields can't mix metrics (aggregated values) and
dimensions (non-aggregated values). Please check the aggregation types
of the fields used in this formula
Date is the dimension and clicks are the metric.
What the result should look like:
DATE ----- CLICKS --------- RESULT
Jan 1, 2020 100 500
Jan 1, 2020 40 200
Jan 1, 2020 10 50
Jan 2, 2020 30 300
Jan 1, 2020 90 900
Is there a way to change Date into a Metric, or is there another way to approach this problem?
I think the way the case statement is written is fine. You may need to check the data type for the Date dimension, and it should be in the "Date" format so that data studio knows to treat that data as a Date and then write the date in the case statement in the same format as well.
If you wrap the CASE inside a MAX you'll get back a number and you'll avoid the error.
clicks * MAX(
(CASE
WHEN Date <= "Jan 1, 2020" then 5
WHEN Date >= "Jan 2, 2020" then 10
END))
clicks * MAX(
(CASE
WHEN Date <= "Aug 15, 2021" then 2
WHEN Date > "Aug 15, 2021" then 4
END))
This worked!

Pandas max dayofyear by year

I have a dataframe with a datetimeindex. There are multiple observations on the same day but different times.
I'm familiar with the dayofyear attribute. Is there a way to use this attribute to also determine the max dayofyear by year? The result would be something like:
2015 252
2016 250
2017 251
If I understand your question, you want to look at a list of dates and for each year get the maximum date for that year.
# Sample data
df = pd.DataFrame({'date':pd.DatetimeIndex(start=pd.datetime(2018,12,24),end=pd.datetime(2019,1,2),freq='h')})
df['dayofyear'] = df.date.dt.dayofyear
df['year'] = df.date.dt.year
df.groupby('year').dayofyear.max()
Out:
year
2018 365
2019 2

Postgresql max date in dataset for each month

I have a table with the following columns (in addition to others):
name char
tanggal date
Rows are inserted into this table each day.
How can I get the formatted name for the max date of each month,
for example:
Jan 31, Feb 28, Mar 31, Apr 30,...
I am using Postgresql 8.3
You could use extract to get the month of the date. From there on, it's a straight forward group by query:
SELECT MAX(tanggal)
FROM mytable
GROUP BY EXTRACT (MONTH FROM tanggal)

Lookup a value from a range of dates in excel

Please help.
We have two tables. A list of accounts and the other is a change log. I need to add a new column in table 1 where the value is the amount in table 2 for the correct account and correct validity period.
ex.
table 1:
account# beginning period ending period
1 January 1, 2012 January 31, 2012
2 January 12, 2012 February 12, 2012
table 2:
account # amount valid period beg valid period end
1 10 january 1, 2009 december 5, 2010
1 20 december 6, 2010 june 1, 2011
1 30 june 2, 2011 december 1, 2012
2 13 january 15, 2011 december 15, 2011
2 20 december 16, 2011 february 20, 2012
Thanks.
Although it is a bit complex requirement it could be done with built-in functions (although it can look a bit obscure :-) ). Specifically I mean function SUMIFS.
It has several parameters.
The first one is an area with values to be summed. It is B8:B12 in this example.
The second is an area whith values to be checked with some condition. It is A8:A12.
The third is a criterion to be applied for area from second parameter. It is (inter alia) account #.
So the formula says: sum all values from rows in B8:B12 where account # (A8:A12) is equal to desired account # (e.g. A3).
Ok, it is not all, you need specify the time range. It would be a bit clunky because you must check if two time period are overlapping (it would be easier to check if one date is in specified period).
But it could be done because SUMIFS can take another pairs of criteria range and criterion. But it cannot be used for OR condition, so you had to combine more SUMIFS.
Nice article about overlapping ranges is e.g. http://chandoo.org/wp/2010/06/01/date-overlap-formulas/
BTW: you have to format cells in B2:C3 and C8:D12 as a date to be able to compare them.

Subtract 4 months from a pair of year/month values

I want to subtract 4 months, the period is defined as year and month:
UPDATE [MAS_YCA].[dbo].[temp_AR_SalesPersonhistory]
SET FiscalYear = year(DATEADD(month,-4,DATEADD(DAY,-1,DATEADD(month,cast(FiscalPeriod as Int),DATEADD(year,cast(FiscalYear as Int)-1900,0))))),
FiscalPeriod = right('00'+cast(month(DATEADD(month,-4,DATEADD(DAY,-1,DATEADD(month,cast(FiscalPeriod as Int),DATEADD(year,cast(FiscalYear as Int)-1900,0))))) as varchar),02)
GO
The error I'm getting is Adding value to a datetime column caused an overflow.
The fields fiscal year and period are both defined as varchar in the datable.
Sounds like you need to check the data in the FiscalYear and FiscalPeriod columns. Most likely, you have an invalid year in the Fiscal year column.
The date range in SQL Server is January 1, 1753, through December 31, 9999. So any year outside this will cause your error.
An easy check (edit, added null and empty string cases):
Select * from [MAS_YCA].[dbo].[temp_AR_SalesPersonhistory]
where cast(FiscalYear as Int) > 9999 or cast(FiscalYear as Int) < 1753
or FiscalYear is NULL or FiscalYear = ''