df.replace not having any effect when trying to replace dates in pandas dataframe - pandas

I've been through the various comments on here about df.replace but I'm still not able to get it working.
Here is a snippet of my code:
# Name columns
df_yearly.columns = ['symbol', 'date', ' annuual % price change']
# Change date format to D/M/Y
df_yearly['date'] = pd.to_datetime(df_yearly['date'], format='%d/%m/%Y')
The df_yearly dataframe looks like this:
| symbol | date | annuual % price change
---|--------|------------|-------------------------
0 | APX | 12/31/2017 |
1 | APX | 12/31/2018 | -0.502554278
2 | AURA | 12/31/2018 | -0.974450706
3 | BASH | 12/31/2016 | -0.998110828
4 | BASH | 12/31/2017 | 8.989361702
5 | BASH | 12/31/2018 | -0.083599574
6 | BCC | 12/31/2017 | 121718.9303
7 | BCC | 12/31/2018 | -0.998018734
I want to replace all dates of 12/31/2018 with 06/30/2018. The next section of my code is:
# Replace 31-12-2018 with 30-06-2018 as this is final date in monthly DF
df_yearly_1 = df_yearly.date.replace('31-12-2018', '30-06-2018')
print(df_yearly_1)
But the output is still coming as:
| 0 | 2017-12-31
| 1 | 2018-12-31
| 2 | 2018-12-31
| 3 | 2016-12-31
| 4 | 2017-12-31
| 5 | 2018-12-31
Is anyone able to help me with this? I thought this might be related to me having the date format incorrect in my df.replace statement but I've tried to search and replace 12-31-2018 and it's still not doing anything.
Thanks in advance!!

try '.astype(str).replace'
df.date.astype(str).replace('2016-12-31', '2018-06-31')

Related

Pandas sum values between two dates in the most efficient way?

I have a dataset which shows production reported every week and another reporting the production every hours over some subproduction. I would now like to compare the sum of all this hourly subproduction with the value reported every week in the most efficient way. How could I achieve this? I would like to avoid a for loop at all cost as my dataset is really large.
So my datasest looks like this:
Weekly reported data:
Datetime_text | Total_Production_A
--------------------------|--------------------
2014-12-08 00:00:00.000 | 8277000
2014-12-15 00:00:00.000 | 8055000
2014-12-22 00:00:00.000 | 7774000
Hourly data:
Datetime_text | A_Prod_1 | A_Prod_2 | A_Prod_3 | ...... | A_Prod_N |
--------------------------|-----------|-----------|-----------|-----------|-----------|
2014-12-06 23:00:00.000 | 454 | 9 | 54 | 104 | 4 |
2014-12-07 00:00:00.000 | 0 | NaV | 0 | 23 | 3 |
2014-12-07 01:00:00.000 | 54 | 0 | 4 | NaV | 20 |
and so on. I would like to a new table where the differnce between the weekly reported data and hourly reported data is calculated for all dates of weekly reported data. So something like this:
Datetime_text | Diff_Production_A
--------------------------|------------------
2014-12-08 00:00:00.000 | 10
2014-12-15 00:00:00.000 | -100
2014-12-22 00:00:00.000 | 1350
where Diff_Production_A = Total_Production_A - sum(A_Prod_1,A_Prod_2,A_Prod_3,...,A_Prod_N;over all datetimes of a week) How can I best achieve this?
Any help is this regard would be greatly appriciated :D
Best
fidu13
Store datetime as pd.Timestamp, then you can do all kinds of manipulation on the dates.
For your problem, they is to group the hourly data by week (starting on Mondays), then merge it with the weekly data and calculate the differences:
weekly["Datetime"] = pd.to_datetime(weekly["Datetime_Text"])
hourly["Datetime"] = pd.to_datetime(hourly["Datetime_Text"])
hourly["HourlyTotal"] = hourly.loc[:, "A_Prod_1":"A_Prod_N"].sum(axis=1)
result = (
hourly.groupby(pd.Grouper(key="Datetime", freq="W-MON"))["HourlyTotal"]
.sum()
.to_frame()
.merge(
weekly[["Datetime", "Total_Production_A"]],
how="outer",
left_index=True,
right_on="Datetime",
)
.assign(Diff=lambda x: x["Total_Production_A"] - x["HourlyTotal"])
)

sum according to field condition

can you please help me with this on crystal reports.
|field1 | field2 |field3 |
|-----------|-----------|-------|
|code1 | abc | 12.00 |
|code2 | xyz | 11:00 |
|code3 | cde | 12.00 |
|code4 | yabc | 2.00 |
|code5 | xabc | 2.00 |
|code6 | xxyzx | 3.00 |
|code7 | fgfgf | 43.00 |
code8 and so on....
i want to add all contains "abc", "xyz", and so on and if not just show the same name as above.
result should be something like:
|-----|------|
|ABC | 16.00|
|XYZ | 14.00|
|code3| 12.00|
|code7| 43.00|
code8 and so on...
note: included on the quotation will not be included in the required result
im still a newbie on crystal reports..
thank you in advance
grace
Create an IF Then Else formula that returns the value if the condition is true and 0 otherwise.
Then, SUM that formula at whatever level of aggregation (Report-level or Group-level) you need.

Writing a date range in access query

I'm hoping to write a query in access that will show the week date range from Sunday to Saturday.
For instance, this week's would be formatted like: 10/15/17 - 10/21/17
Not sure how to even begin with this.
I do have a column of the Week Number using the formula: DatePart("ww",[date]).
It seems logical to write something that says
if week number is the same, minimum date & " - "& maximum date
I have no idea how to write this in a query though or if this would be to use VBA...
Here is essentially how the table looks. Column C is how I would like the query data to look once the query runs:
| Date | Week | Date Range |
|---------|------|-------------------|
| 8/1/17 | 1 | 8/1/17 - 8/7/17 |
| 8/4/17 | 1 | 8/1/17 - 8/7/17 |
| 8/7/17 | 1 | 8/1/17 - 8/7/17 |
| 8/8/17 | 2 | 8/8/17 - 8/14/17 |
| 8/11/17 | 2 | 8/8/17 - 8/14/17 |
| 8/14/17 | 2 | 8/8/17 - 8/14/17 |
| 8/15/17 | 3 | 8/15/17 - 8/21/17 |
| 8/18/17 | 3 | 8/15/17 - 8/21/17 |
| 8/21/17 | 3 | 8/15/17 - 8/21/17 |
Any help would be appreciated!
You can use WEEKDAY which will tell you the day number of the week that your date falls on. A quick calculation will return Sundays date - add 7 and you get the following Saturday date.
SELECT MyDate-Weekday(MyDate,1)+1 & " - " & MyDate-Weekday(MyDate,1)+7
FROM Table1

Pandas: need to create dataframe for weekly search per event occurrence

If I have this events dataframe df_e below:
|------|------------|-------|
| group| event date | count |
| x123 | 2016-01-06 | 1 |
| | 2016-01-08 | 10 |
| | 2016-02-15 | 9 |
| | 2016-05-22 | 6 |
| | 2016-05-29 | 2 |
| | 2016-05-31 | 6 |
| | 2016-12-29 | 1 |
| x124 | 2016-01-01 | 1 |
...
and also know the t0 which is the beginning of time (let's say for x123 it's 2016-01-01) and tN which is the end of experiment from another dataframe df_s (2017-05-25), then how can I create the dataframe df_new which should like this
|------|------------|---------------|--------|
| group| obs. weekly| lifetime, week| status |
| x123 | 2016-01-01 | 1 | 1 |
| | 2016-01-08 | 0 | 0 |
| | 2016-01-15 | 0 | 0 |
| | 2016-01-22 | 1 | 1 |
| | 2016-01-29 | 2 | 1 |
...
| | 2017-05-18 | 1 | 1 |
| | 2017-05-25 | 1 | 1 |
...
| x124 | 2017-05-18 | 1 | 1 |
| x124 | 2017-05-25 | 1 | 1 |
Explanation: take t0 and generate rows until tN per week period. For each row R, search with that group if the event date falls within R, if True, then count how long in weeks it lives there, also set status = 1 as alive, otherwise set lifetime, status columns for this R as 0, e.g. dead.
Questions:
1) How to generate dataframes per group given t0 and tN values, e.g. generate [group, obs. weekly, lifetime, status] columns for (tN - t0) / week rows?
2) How to accomplish the construction of such df_new dataframe explained above?
I can begin with this so far =)
import pandas as pd
# 1. generate dataframes per group to get the boundary within `t0` and `tN` from df_s dataframe, where each dataframe has "group, obs, lifetime, status" columns X (tN - t0 / week) rows filled with 0 values.
df_all = pd.concat([df_group1, df_group2])
def do_that(R):
found_event_row = df_e.iloc[[R.group]]
# check if found_event_row['date'] falls into R['obs'] week
# if True, then found how long it's there
df_new = df_all.apply(do_that)
I'm not really sure if I get you but group one is not related to group two, right? if that's the case I think what you want is something like this:
import pandas as pd
df_group1 = df_group1.set_index('event date')
df_group1.index = pd.to_datetime(df_group1.index) #convert the index to datetime so you can 'resample'
df_group1['lifetime, week'] = df_group1.resample('1W').apply(lamda x: yourfuncion(x))
df_group1 = df_group1.reset_index()
df_group1['status']= df_group1.apply(lambda x: 1 if x['lifetime, week']>0 else 0)
#do the same with group2 and concat to create df_all
I'm not sure how you get 'lifetime, week' but all that's left is creating the function that generates it.

Only Some Dates From SQL SELECT Being Set To "0" or "1969-12-31" -- UNIX_TIMESTAMP

So I have been doing pretty well on my project (Link to previous StackOverflow question), and have managed to learn quite a bit, but there is this one problem that has been really dogging me for days and I just can't seem to solve it.
It has to do with using the UNIX_TIMESTAMP call to convert dates in my SQL database to UNIX time-format, but for some reason only one set of dates in my table is giving me issues!
==============
So these are the values I am getting -
#abridged here, see the results from the SELECT statement below to see the rest
#of the fields outputted
| firstVst | nextVst | DOB |
| 1206936000 | 1396238400 | 0 |
| 1313726400 | 1313726400 | 278395200 |
| 1318910400 | 1413604800 | 0 |
| 1319083200 | 1413777600 | 0 |
when I use this SELECT statment -
SELECT SQL_CALC_FOUND_ROWS *,UNIX_TIMESTAMP(firstVst) AS firstVst,
UNIX_TIMESTAMP(nextVst) AS nextVst, UNIX_TIMESTAMP(DOB) AS DOB FROM people
ORDER BY "ref DESC";
So my big question is: why in the heck are 3 out of 4 of my DOBs being set to date of 0 (IE 12/31/1969 on my PC)? Why is this not happening in my other fields?
I can see the data quite well using a more simple SELECT statement and the DOB field looks fine...?
#formatting broken to change some variable names etc.
select * FROM people;
| ref | lastName | firstName | DOB | rN | lN | firstVst | disp | repName | nextVst |
| 10001 | BlankA | NameA | 1968-04-15 | 1000000 | 4600000 | 2008-03-31 | Positive | Patrick Smith | 2014-03-31 |
| 10002 | BlankB | NameB | 1978-10-28 | 1000001 | 4600001 | 2011-08-19 | Positive | Patrick Smith | 2011-08-19 |
| 10003 | BlankC | NameC | 1941-06-08 | 1000002 | 4600002 | 2011-10-18 | Positive | Patrick Smith | 2014-10-18 |
| 10004 | BlankD | NameD | 1952-08-01 | 1000003 | 4600003 | 2011-10-20 | Positive | Patrick Smith | 2014-10-20 |
It's because those DoB's are from before 12/31/1969, and the UNIX epoch starts then, so anything prior to that would be negative.
From Wikipedia:
Unix time, or POSIX time, is a system for describing instants in time, defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, not counting leap seconds.
A bit more elaboration: Basically what you're trying to do isn't possible. Depending on what it's for, there may be a different way you can do this, but using UNIX timestamps probably isn't the best idea for dates like that.