How to create new Pandas Dataframe with columns form DataFrame (PYTHON) - pandas

I am creating a DataFrame from a csv file, where my index (rows) is date and my column names are names of cities.
After I create the raw DataFrame, I am trying to create a DataFrame from selected columns. I have tried:
A=df['city1'] #city 1
B=df['city2']
C=pd.merge(A,B)
but it does't work. This is what A and B look like.
Date
2013-11-01 2.56
2013-12-01 1.77
2014-01-01 0.00
2014-02-01 0.38
2014-03-01 13.16
2014-04-01 10.29
2014-05-01 15.43
2014-06-01 11.48
2014-07-01 8.54
2014-08-01 11.11
2014-09-01 2.71
2014-10-01 4.16
2014-11-01 13.01
2014-12-01 9.59
Name: Seattle.Washington, dtype: float64 Date
And this is what I am looking to create:
City1 City2
Date
2013-11-01 0.00 2.94
2013-12-01 8.26 3.41
2014-01-01 1.11 14.27
2014-02-01 32.86 84.26
2014-03-01 34.12 0.00
2014-04-01 68.39 0.00
2014-05-01 27.17 9.09
2014-06-01 10.47 32.00
2014-07-01 14.19 26.83
2014-08-01 14.91 6.36
2014-09-01 3.76 8.32
2014-10-01 5.83 2.19
2014-11-01 10.79 2.64
2014-12-01 21.24 8.08
Any suggestion?
Error Message:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-222-ec50ff9f372f> in <module>()
14 S = df['City1']
15 A = df['City2']
16
---> 17 print merge(S,A)
18 #df2=pd.merge(A,A)
19 #print df2
C:\...\merge.pyc in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy)
36 right_on=right_on, left_index=left_index,
37 right_index=right_index, sort=sort, suffixes=suffixes,
---> 38 copy=copy)
39 return op.get_result()
40 if __debug__:

Answer: (Courtesy of #EdChum)
df[['City1', 'City2']]

Related

Future dates calculating incorrectly in FBProphet - make_future_dataframe method

I'm trying to do a weekly forecast in FBProphet for just 5 weeks ahead. The make_future_dataframe method doesn't seem to be working right....makes the correct one week intervals except for one week between jul 3 and Jul 5....every other interval is correct at 7 days or a week. Code and output below:
INPUT DATAFRAME
ds y
548 2010-01-01 3117
547 2010-01-08 2850
546 2010-01-15 2607
545 2010-01-22 2521
544 2010-01-29 2406
... ... ...
4 2020-06-05 2807
3 2020-06-12 2892
2 2020-06-19 3012
1 2020-06-26 3077
0 2020-07-03 3133
CODE
future = m.make_future_dataframe(periods=5, freq='W')
future.tail(9)
OUTPUT
ds
545 2020-06-12
546 2020-06-19
547 2020-06-26
548 2020-07-03
549 2020-07-05
550 2020-07-12
551 2020-07-19
552 2020-07-26
553 2020-08-02
All you need to do is create a dataframe with the dates you need for predict method. utilizing the make_future_dataframe method is not necessary.

how do i access only specific entries of a dataframe having date as index

[this is tail of my DataFrame for around 1000 entries][1]
Open Close High Change mx_profitable
Date
2018-06-06 263.00 270.15 271.4 7.15 8.40
2018-06-08 268.95 273.00 273.9 4.05 4.95
2018-06-11 273.30 274.00 278.4 0.70 5.10
2018-06-12 274.00 282.85 284.4 8.85 10.40
I have to sort out the entries of only certain dates, for example, 25th of every month.
I think need DatetimeIndex.day with boolean indexing:
df[df.index.day == 25]
Sample:
rng = pd.date_range('2017-04-03', periods=1000)
df = pd.DataFrame({'a': range(1000)}, index=rng)
print (df.head())
a
2017-04-03 0
2017-04-04 1
2017-04-05 2
2017-04-06 3
2017-04-07 4
df1 = df[df.index.day == 25]
print (df1.head())
a
2017-04-25 22
2017-05-25 52
2017-06-25 83
2017-07-25 113
2017-08-25 144

Multiply two pandas series matching on series index

I have 2 pandas series
Series 1:
2016-01-31 10
2016-01-31 20
2016-03-31 30
2016-03-31 40
Series 2:
2016-01-31 2
2016-03-31 3
I want to multiply Series 1 and Series 2 matching on index:
Answer
2016-01-31 20
2016-01-31 40
2016-03-31 90
2016-03-31 120
Use mul with parameter fill_value=1:
s = s1.mul(s2, fill_value=1)
print (s)
2016-01-31 20
2016-01-31 40
2016-03-31 90
2016-03-31 120
dtype: int64

seasonal_decompose: operands could not be broadcast together with shapes on a series

I know there are many questions on this topic, but none of them helped me to solve this problem. I'm really stuck on this.
With a simple series:
0
2016-01-31 266
2016-02-29 235
2016-03-31 347
2016-04-30 514
2016-05-31 374
2016-06-30 250
2016-07-31 441
2016-08-31 422
2016-09-30 323
2016-10-31 168
2016-11-30 496
2016-12-31 303
import statsmodels.api as sm
logdf = np.log(df[0])
decompose = sm.tsa.seasonal_decompose(logdf,freq=12, model='additive')
decomplot = decompose.plot()
i keep getting: ValueError: operands could not be broadcast together with shapes (12,) (14,)
I've tried pretty much everything, passing only logdf.values, passing a non-log series. It doesn't work.
Numpy and statsmodel versions:
print(statsmodels.__version__)
print(pd.__version__)
print(np.__version__)
0.6.1
0.18.1
1.11.3
As #yoonforh pointed, in my case this was fixed by setting the freq parameter to less than the time series length. E.g. if your time series ts looks like this:
2014-01-01 0.0
2014-02-01 0.0
2014-03-01 1.0
2014-04-01 1.0
2014-05-01 0.0
2014-06-01 1.0
2014-07-01 1.0
2014-08-01 0.0
2014-09-01 0.0
2014-10-01 1.0
2014-11-01 0.0
2014-12-01 0.0
the shape is
(12,)
so this will give the error as per above:
seasonal_decompose(ts, freq=12, model='additive')
but if I try freq=11 or any other int less than 12, e.g.
seasonal_decompose(ts, freq=11, model='additive')
this works
i noticed that with newer pandas and statsmodel versions it seems to work.
Given a series:
2016-01-03 8.326275
2016-01-10 8.898229
2016-01-17 8.754792
2016-01-24 8.658172
2016-01-31 8.731659
2016-02-07 9.047233
2016-02-14 8.799662
2016-02-21 8.783549
2016-02-28 8.782783
2016-03-06 9.081825
2016-03-13 8.737934
2016-03-20 8.658693
2016-03-27 8.666475
2016-04-03 9.029178
2016-04-10 8.781555
2016-04-17 8.720787
2016-04-24 8.633909
2016-05-01 8.937744
2016-05-08 8.804925
2016-05-15 8.766862
2016-05-22 8.651899
2016-05-29 8.653645
...
And pd/sm version:
statsmodels.__version__ 0.8.0
pandas.__version__ 0.20.1
This is the result:
import statsmodels.api as sm
logdf = np.log(df_series)
decompose = sm.tsa.seasonal_decompose(logdf, model='additive', filt=None, freq=1, two_sided=True)
decompose.plot()
I hope this could solve your problem too.

Pandas - Group into 24-hour blocks, but not midnight-to-midnight

I have a time Series. I'd like to group into into blocks of 24-hour blocks, from 8am to 7:59am the next day. I know how to group by date, but I've tried and failed to handle this 8-hour offset using TimeGroupers and DateOffsets.
I think you can use Grouper with parameter base:
print df
date name
0 2015-06-13 00:21:25 1
1 2015-06-14 01:00:25 2
2 2015-06-14 02:54:48 3
3 2015-06-15 14:38:15 2
4 2015-06-15 15:29:28 1
print df.groupby(pd.Grouper(key='date', freq='24h', base=8)).sum()
name
date
2015-06-12 08:00:00 1.0
2015-06-13 08:00:00 5.0
2015-06-14 08:00:00 NaN
2015-06-15 08:00:00 3.0
alternatively to #jezrael's method you can use your custom grouper function:
start_ts = '2016-01-01 07:59:59'
df = pd.DataFrame({'Date': pd.date_range(start_ts, freq='10min', periods=1000)})
def my_grouper(df, idx):
return df.ix[idx, 'Date'].date() if df.ix[idx, 'Date'].hour >= 8 else df.ix[idx, 'Date'].date() - pd.Timedelta('1day')
df.groupby(lambda x: my_grouper(df, x)).size()
Test:
In [468]: df.head()
Out[468]:
Date
0 2016-01-01 07:59:59
1 2016-01-01 08:09:59
2 2016-01-01 08:19:59
3 2016-01-01 08:29:59
4 2016-01-01 08:39:59
In [469]: df.tail()
Out[469]:
Date
995 2016-01-08 05:49:59
996 2016-01-08 05:59:59
997 2016-01-08 06:09:59
998 2016-01-08 06:19:59
999 2016-01-08 06:29:59
In [470]: df.groupby(lambda x: my_grouper(df, x)).size()
Out[470]:
2015-12-31 1
2016-01-01 144
2016-01-02 144
2016-01-03 144
2016-01-04 144
2016-01-05 144
2016-01-06 144
2016-01-07 135
dtype: int64