Pandas: need to create dataframe for weekly search per event occurrence

Pandas: need to create dataframe for weekly search per event occurrence - pandas

If I have this events dataframe df_e below:
|------|------------|-------|
| group| event date | count |
| x123 | 2016-01-06 | 1 |
| | 2016-01-08 | 10 |
| | 2016-02-15 | 9 |
| | 2016-05-22 | 6 |
| | 2016-05-29 | 2 |
| | 2016-05-31 | 6 |
| | 2016-12-29 | 1 |
| x124 | 2016-01-01 | 1 |
...
and also know the t0 which is the beginning of time (let's say for x123 it's 2016-01-01) and tN which is the end of experiment from another dataframe df_s (2017-05-25), then how can I create the dataframe df_new which should like this
|------|------------|---------------|--------|
| group| obs. weekly| lifetime, week| status |
| x123 | 2016-01-01 | 1 | 1 |
| | 2016-01-08 | 0 | 0 |
| | 2016-01-15 | 0 | 0 |
| | 2016-01-22 | 1 | 1 |
| | 2016-01-29 | 2 | 1 |
...
| | 2017-05-18 | 1 | 1 |
| | 2017-05-25 | 1 | 1 |
...
| x124 | 2017-05-18 | 1 | 1 |
| x124 | 2017-05-25 | 1 | 1 |
Explanation: take t0 and generate rows until tN per week period. For each row R, search with that group if the event date falls within R, if True, then count how long in weeks it lives there, also set status = 1 as alive, otherwise set lifetime, status columns for this R as 0, e.g. dead.
Questions:
1) How to generate dataframes per group given t0 and tN values, e.g. generate [group, obs. weekly, lifetime, status] columns for (tN - t0) / week rows?
2) How to accomplish the construction of such df_new dataframe explained above?
I can begin with this so far =)
import pandas as pd
# 1. generate dataframes per group to get the boundary within `t0` and `tN` from df_s dataframe, where each dataframe has "group, obs, lifetime, status" columns X (tN - t0 / week) rows filled with 0 values.
df_all = pd.concat([df_group1, df_group2])
def do_that(R):
found_event_row = df_e.iloc[[R.group]]
# check if found_event_row['date'] falls into R['obs'] week
# if True, then found how long it's there
df_new = df_all.apply(do_that)

I'm not really sure if I get you but group one is not related to group two, right? if that's the case I think what you want is something like this:
import pandas as pd
df_group1 = df_group1.set_index('event date')
df_group1.index = pd.to_datetime(df_group1.index) #convert the index to datetime so you can 'resample'
df_group1['lifetime, week'] = df_group1.resample('1W').apply(lamda x: yourfuncion(x))
df_group1 = df_group1.reset_index()
df_group1['status']= df_group1.apply(lambda x: 1 if x['lifetime, week']>0 else 0)
#do the same with group2 and concat to create df_all
I'm not sure how you get 'lifetime, week' but all that's left is creating the function that generates it.

Related

Pyspark get rows with max value for a column over a window

I have a dataframe as follows:
| created | id | date |value|
| 1650983874871 | x | 2020-05-08 | 5 |
| 1650367659030 | x | 2020-05-08 | 3 |
| 1639429213087 | x | 2020-05-08 | 2 |
| 1650983874871 | x | 2020-06-08 | 5 |
| 1650367659030 | x | 2020-06-08 | 3 |
| 1639429213087 | x | 2020-06-08 | 2 |
I want to get max of created for every date.
The table should look like :
| created | id | date |value|
| 1650983874871 | x | 2020-05-08 | 5 |
| 1650983874871 | x | 2020-06-08 | 5 |
I tried:
df2 = (
df
.groupby(['id', 'date'])
.agg(
F.max(F.col('created')).alias('created_max')
)
df3 = df.join(df2, on=['id', 'date'], how='left')
But this is not working as expected.
Can anyone help me.

You need to make two changes.
The join condition needs to include created as well. Here I have changed alias to alias("created") to make the join easier. This will ensure a unique join condition (if there are no duplicate created values).
The join type must be inner.
df2 = (
df
.groupby(['id', 'date'])
.agg(
F.max(F.col('created')).alias('created')
)
)
df3 = df.join(df2, on=['id', 'date','created'], how='inner')
df3.show()
+---+----------+-------------+-----+
| id| date| created|value|
+---+----------+-------------+-----+
| x|2020-05-08|1650983874871| 5|
| x|2020-06-08|1650983874871| 5|
+---+----------+-------------+-----+

Instead of using the group by and joining, you can also use the Window in pyspark.sql:
from pyspark.sql import functions as func
from pyspark.sql.window import Window
df = df\
.withColumn('max_created', func.max('created').over(Window.partitionBy('date', 'id')))\
.filter(func.col('created')==func.col('max_created'))\
.drop('max_created')
Step:
Get the max value based on the Window
Filter the row by using the matched timestamp

How do you control float formatting when using DataFrame.to_markdown in pandas?

I'm trying to use DataFrame.to_markdown with a dataframe that contains float values that I'd like to have rounded off. Without to_markdown() I can just set pd.options.display.float_format and everything works fine, but to_markdown doesn't seem to be respecting that option.
Repro:
import pandas as pd
df = pd.DataFrame([[1, 2, 3], [42.42, 99.11234123412341234, -23]])
pd.options.display.float_format = '{:,.0f}'.format
print(df)
print()
print(df.to_markdown())
outputs:
0 1 2
0 1 2 3
1 42 99 -23
| | 0 | 1 | 2 |
|---:|------:|--------:|----:|
| 0 | 1 | 2 | 3 |
| 1 | 42.42 | 99.1123 | -23 |
(compare the 42.42 and 99.1123 in the to_markdown table to the 42 and 99 in the plain old df)
Is this a bug or am I missing something about how to use to_markdown?

It looks like pandas uses tabulate for this formatting. If it's installed, you can use something like:
df.to_markdown(floatfmt=".0f")
output:
| | 0 | 1 | 2 |
|---:|----:|----:|----:|
| 0 | 1 | 2 | 3 |
| 1 | 42 | 99 | -23 |

Iterate through pandas data frame and replace some strings with numbers

I have a dataframe sample_df that looks like:
bar foo
0 rejected unidentified
1 clear caution
2 caution NaN
Note this is just a random made up df, there are lot of other columns lets say with different data types than just text. bar and foo might also have lots of empty cells/values which are NaNs.
The actual df looks like this, the above is just a sample btw:
| | Unnamed: 0 | user_id | result | face_comparison_result | created_at | facial_image_integrity_result | visual_authenticity_result | properties | attempt_id |
|-----:|-------------:|:---------------------------------|:---------|:-------------------------|:--------------------|:--------------------------------|:-----------------------------|:----------------|:---------------------------------|
| 0 | 58 | ecee468d4a124a8eafeec61271cd0da1 | clear | clear | 2017-06-20 17:50:43 | clear | clear | {} | 9e4277fc1ddf4a059da3dd2db35f6c76 |
| 1 | 76 | 1895d2b1782740bb8503b9bf3edf1ead | clear | clear | 2017-06-20 13:28:00 | clear | clear | {} | ab259d3cb33b4711b0a5174e4de1d72c |
| 2 | 217 | e71b27ea145249878b10f5b3f1fb4317 | clear | clear | 2017-06-18 21:18:31 | clear | clear | {} | 2b7f1c6f3fc5416286d9f1c97b15e8f9 |
| 3 | 221 | f512dc74bd1b4c109d9bd2981518a9f8 | clear | clear | 2017-06-18 22:17:29 | clear | clear | {} | ab5989375b514968b2ff2b21095ed1ef |
| 4 | 251 | 0685c7945d1349b7a954e1a0869bae4b | clear | clear | 2017-06-18 19:54:21 | caution | clear | {} | dd1b0b2dbe234f4cb747cc054de2fdd3 |
| 5 | 253 | 1a1a994f540147ab913fcd61b7a859d9 | clear | clear | 2017-06-18 20:05:05 | clear | clear | {} | 1475037353a848318a32324539a6947e |
| 6 | 334 | 26e89e4a60f1451285e70ca8dc5bc90e | clear | clear | 2017-06-17 20:21:54 | suspected | clear | {} | 244fa3e7cfdb48afb44844f064134fec |
| 7 | 340 | 41afdea02a9c42098a15d94a05e8452b | NaN | clear | 2017-06-17 20:42:53 | clear | clear | {} | b066a4043122437bafae3ddcf6c2ab07 |
| 8 | 424 | 6cf6eb05a3cc4aabb69c19956a055eb9 | rejected | NaN | 2017-06-16 20:00:26 |
I want to replace any strings I find with numbers, per the below mapping.
def no_strings(df):
columns=list(df)
for column in columns:
df[column] = df[column].map(result_map)
#We will need a mapping of strings to numbers to be able to analyse later.
result_map = {'unidentified':0,"clear": 1, 'suspected': 2,"caution" : 3, 'rejected':4}
So the output might look like:
bar foo
0 4 0
1 1 3
2 3 NaN
For some reason, when I run no_strings(sample_df) I get errors.
What am I doing wrong?

df['bar'] = df['bar'].map(result_map)
df['foo'] = df['foo'].map(result_map)
df
bar foo
0 4 0
1 1 3
2 3 2
However, if you wish to be on the safe side (assuming a key/value is not in your result_map and you dont want to see a NaN) do this:
df['foo'] = df['foo'].map(lambda x: result_map.get(x, 'not found'))
df['bar'] = df['bar'].map(lambda x: result_map.get(x, 'not found'))
so an out put for this df
bar foo
0 rejected unidentified
1 clear caution
2 caution suspected
3 sdgdg 0000
will result in:
bar foo
0 4 0
1 1 3
2 3 2
3 not found not found
To be extra efficient:
cols = ['foo','bar','other_columns']
for c in cols:
df[c] = df[c].map(lambda x: result_map.get(x, 'not found'))

Lets try stack, map the dict and then unstack
df.stack().to_frame()[0].map(result_map).unstack()
bar foo
0 4 0
1 1 3
2 3 2

df.replace not having any effect when trying to replace dates in pandas dataframe

I've been through the various comments on here about df.replace but I'm still not able to get it working.
Here is a snippet of my code:
# Name columns
df_yearly.columns = ['symbol', 'date', ' annuual % price change']
# Change date format to D/M/Y
df_yearly['date'] = pd.to_datetime(df_yearly['date'], format='%d/%m/%Y')
The df_yearly dataframe looks like this:
| symbol | date | annuual % price change
---|--------|------------|-------------------------
0 | APX | 12/31/2017 |
1 | APX | 12/31/2018 | -0.502554278
2 | AURA | 12/31/2018 | -0.974450706
3 | BASH | 12/31/2016 | -0.998110828
4 | BASH | 12/31/2017 | 8.989361702
5 | BASH | 12/31/2018 | -0.083599574
6 | BCC | 12/31/2017 | 121718.9303
7 | BCC | 12/31/2018 | -0.998018734
I want to replace all dates of 12/31/2018 with 06/30/2018. The next section of my code is:
# Replace 31-12-2018 with 30-06-2018 as this is final date in monthly DF
df_yearly_1 = df_yearly.date.replace('31-12-2018', '30-06-2018')
print(df_yearly_1)
But the output is still coming as:
| 0 | 2017-12-31
| 1 | 2018-12-31
| 2 | 2018-12-31
| 3 | 2016-12-31
| 4 | 2017-12-31
| 5 | 2018-12-31
Is anyone able to help me with this? I thought this might be related to me having the date format incorrect in my df.replace statement but I've tried to search and replace 12-31-2018 and it's still not doing anything.
Thanks in advance!!

try '.astype(str).replace'
df.date.astype(str).replace('2016-12-31', '2018-06-31')

How to define a sub query inside SQL statement to be used several times as a table alias?

I have an MS Access database for rainfall data of several climate stations.
For each day of each station, I want to calculate the rainfall in the previous day (if recorded), and the sum of the rainfall at the previous 3 and 7 days.
Due to the huge amount of data and the limitations of Access, I made a query that takes station by station; Then I applied an auxillary query to find dates first, For each station, The following SQL statement is applied (and named RainFallStudy query):
SELECT
[173].ID, [173].AirportCode, [173].RFmm,
DateSerial([rYear], [rMonth], [rDay]) AS DateSer,
[DateSer]-1 AS DM1,
[DateSer]-2 AS DM2,
[DateSer]-3 AS DM3,
[DateSer]-4 AS DM4,
[DateSer]-5 AS DM5,
[DateSer]-6 AS DM6,
[DateSer]-7 AS DM7
FROM
[173]
WHERE
((([173].AirportCode) = 786660));
I used DM1, DM2, etc as the date serial of the day-1, day-2, etc.
Then I used another query that uses RainFallStudy query with left joints as shown in the figure:
The SQL statement is
SELECT
RainFallStudy.ID, RainFallStudy.AirportCode,
RainFallStudy.RFmm AS RF0, RainFallStudy.DateSer,
RainFallStudy.DM1, RainFallStudy_1.RFmm AS RF1,
RainFallStudy_2.RFmm AS RF2, RainFallStudy_3.RFmm AS RF3,
RainFallStudy_4.RFmm AS RF4, RainFallStudy_5.RFmm AS RF5,
RainFallStudy_6.RFmm AS RF6, RainFallStudy_7.RFmm AS RF7,
Nz([rf1], 0) + Nz([rf2], 0) + Nz([rf3], 0) + Nz([rf4], 0) + Nz([rf5], 0) + Nz([rf6], 0) + Nz([rf7], 0) AS RF_W
FROM
((((((RainFallStudy
LEFT JOIN
RainFallStudy AS RainFallStudy_1 ON RainFallStudy.DM1 = RainFallStudy_1.DateSer)
LEFT JOIN
RainFallStudy AS RainFallStudy_2 ON RainFallStudy.DM2 = RainFallStudy_2.DateSer)
LEFT JOIN
RainFallStudy AS RainFallStudy_3 ON RainFallStudy.DM3 = RainFallStudy_3.DateSer)
LEFT JOIN
RainFallStudy AS RainFallStudy_4 ON RainFallStudy.DM4 = RainFallStudy_4.DateSer)
LEFT JOIN
RainFallStudy AS RainFallStudy_5 ON RainFallStudy.DM5 = RainFallStudy_5.DateSer)
LEFT JOIN
RainFallStudy AS RainFallStudy_6 ON RainFallStudy.DM6 = RainFallStudy_6.DateSer)
LEFT JOIN
RainFallStudy AS RainFallStudy_7 ON RainFallStudy.DM7 = RainFallStudy_7.RFmm;
Now I suffer from the slow performance of this query, as the records of each station range from 1,000 to 750,000 records! Is there any better way to find what I need in a faster SQL statement? The second question, can I make a standalone SQL statement for that (one query without the auxiliary query) as I will use it in python, which requires one SQL statement (as Iof my knowledge).
Thanks in advance.
Update
As requested by #Andre, Here are some sample data of table [173] in HTML
<table><tbody><tr><th>ID</th><th>AirportCode</th><th>rYear</th><th>rMonth</th><th>rDay</th><th>RFmm</th></tr><tr><td>11216</td><td>409040</td><td>2012</td><td>1</td><td>23</td><td>0.51</td></tr><tr><td>11217</td><td>409040</td><td>2012</td><td>1</td><td>24</td><td>0</td></tr><tr><td>11218</td><td>409040</td><td>2012</td><td>1</td><td>25</td><td>0</td></tr><tr><td>11219</td><td>409040</td><td>2012</td><td>1</td><td>26</td><td>2.03</td></tr><tr><td>11220</td><td>409040</td><td>2012</td><td>1</td><td>27</td><td>0</td></tr><tr><td>11221</td><td>409040</td><td>2012</td><td>1</td><td>28</td><td>0</td></tr><tr><td>11222</td><td>409040</td><td>2012</td><td>1</td><td>29</td><td>0</td></tr><tr><td>11223</td><td>409040</td><td>2012</td><td>1</td><td>30</td><td>0</td></tr><tr><td>11224</td><td>409040</td><td>2012</td><td>1</td><td>31</td><td>0.25</td></tr><tr><td>11225</td><td>409040</td><td>2012</td><td>2</td><td>1</td><td>0</td></tr><tr><td>11226</td><td>409040</td><td>2012</td><td>2</td><td>2</td><td>0</td></tr><tr><td>11227</td><td>409040</td><td>2012</td><td>2</td><td>3</td><td>4.32</td></tr><tr><td>11228</td><td>409040</td><td>2012</td><td>2</td><td>4</td><td>13.21</td></tr><tr><td>11229</td><td>409040</td><td>2012</td><td>2</td><td>5</td><td>1.02</td></tr><tr><td>11230</td><td>409040</td><td>2012</td><td>2</td><td>6</td><td>0</td></tr><tr><td>11231</td><td>409040</td><td>2012</td><td>2</td><td>7</td><td>0</td></tr><tr><td>11232</td><td>409040</td><td>2012</td><td>2</td><td>8</td><td>0</td></tr><tr><td>11233</td><td>409040</td><td>2012</td><td>2</td><td>9</td><td>0</td></tr><tr><td>11234</td><td>409040</td><td>2012</td><td>2</td><td>10</td><td>5.08</td></tr><tr><td>11235</td><td>409040</td><td>2012</td><td>2</td><td>11</td><td>0</td></tr><tr><td>11236</td><td>409040</td><td>2012</td><td>2</td><td>12</td><td>12.95</td></tr><tr><td>11237</td><td>409040</td><td>2012</td><td>2</td><td>13</td><td>5.59</td></tr><tr><td>11238</td><td>409040</td><td>2012</td><td>2</td><td>14</td><td>0.25</td></tr><tr><td>11239</td><td>409040</td><td>2012</td><td>2</td><td>15</td><td>0</td></tr><tr><td>11240</td><td>409040</td><td>2012</td><td>2</td><td>16</td><td>0</td></tr><tr><td>11241</td><td>409040</td><td>2012</td><td>2</td><td>17</td><td>0</td></tr><tr><td>11242</td><td>409040</td><td>2012</td><td>2</td><td>18</td><td>0</td></tr><tr><td>11243</td><td>409040</td><td>2012</td><td>2</td><td>19</td><td>0</td></tr><tr><td>11244</td><td>409040</td><td>2012</td><td>2</td><td>20</td><td>14.48</td></tr><tr><td>11245</td><td>409040</td><td>2012</td><td>2</td><td>21</td><td>9.65</td></tr><tr><td>11246</td><td>409040</td><td>2012</td><td>2</td><td>22</td><td>3.05</td></tr><tr><td>11247</td><td>409040</td><td>2012</td><td>2</td><td>23</td><td>0</td></tr><tr><td>11248</td><td>409040</td><td>2012</td><td>2</td><td>24</td><td>0</td></tr><tr><td>11249</td><td>409040</td><td>2012</td><td>2</td><td>25</td><td>0</td></tr><tr><td>11250</td><td>409040</td><td>2012</td><td>2</td><td>26</td><td>0</td></tr><tr><td>11251</td><td>409040</td><td>2012</td><td>2</td><td>27</td><td>0</td></tr><tr><td>11252</td><td>409040</td><td>2012</td><td>2</td><td>28</td><td>7.37</td></tr><tr><td>11253</td><td>409040</td><td>2012</td><td>2</td><td>29</td><td>0</td></tr></tbody></table>
And here is sample output (HTML)
<table><tbody><tr><th>ID</th><th>AirportCode</th><th>DateSer</th><th>ThisDay</th><th>Yesterday</th><th>Prev3days</th><th>PrevWeek</th></tr><tr><td>11216</td><td>409040</td><td>23-01-2012</td><td>0.51</td><td>0</td><td>0</td><td>0</td></tr><tr><td>11217</td><td>409040</td><td>24-01-2012</td><td>0</td><td>0.51</td><td>0.51</td><td>0.51</td></tr><tr><td>11218</td><td>409040</td><td>25-01-2012</td><td>0</td><td>0</td><td>0.51</td><td>0.51</td></tr><tr><td>11219</td><td>409040</td><td>26-01-2012</td><td>2.03</td><td>0</td><td>0.51</td><td>0.51</td></tr><tr><td>11220</td><td>409040</td><td>27-01-2012</td><td>0</td><td>2.03</td><td>2.03</td><td>2.54</td></tr><tr><td>11221</td><td>409040</td><td>28-01-2012</td><td>0</td><td>0</td><td>2.03</td><td>2.54</td></tr><tr><td>11222</td><td>409040</td><td>29-01-2012</td><td>0</td><td>0</td><td>2.03</td><td>2.54</td></tr><tr><td>11223</td><td>409040</td><td>30-01-2012</td><td>0</td><td>0</td><td>0</td><td>2.54</td></tr><tr><td>11224</td><td>409040</td><td>31-01-2012</td><td>0.25</td><td>0</td><td>0</td><td>2.03</td></tr><tr><td>11225</td><td>409040</td><td>01-02-2012</td><td>0</td><td>0.25</td><td>0.25</td><td>2.28</td></tr><tr><td>11226</td><td>409040</td><td>02-02-2012</td><td>0</td><td>0</td><td>0.25</td><td>2.28</td></tr><tr><td>11227</td><td>409040</td><td>03-02-2012</td><td>4.32</td><td>0</td><td>0.25</td><td>0.25</td></tr><tr><td>11228</td><td>409040</td><td>04-02-2012</td><td>13.21</td><td>4.32</td><td>4.32</td><td>4.57</td></tr><tr><td>11229</td><td>409040</td><td>05-02-2012</td><td>1.02</td><td>13.21</td><td>17.53</td><td>17.78</td></tr><tr><td>11230</td><td>409040</td><td>06-02-2012</td><td>0</td><td>1.02</td><td>18.55</td><td>18.8</td></tr><tr><td>11231</td><td>409040</td><td>07-02-2012</td><td>0</td><td>0</td><td>14.23</td><td>18.8</td></tr><tr><td>11232</td><td>409040</td><td>08-02-2012</td><td>0</td><td>0</td><td>1.02</td><td>18.55</td></tr><tr><td>11233</td><td>409040</td><td>09-02-2012</td><td>0</td><td>0</td><td>0</td><td>18.55</td></tr><tr><td>11234</td><td>409040</td><td>10-02-2012</td><td>5.08</td><td>0</td><td>0</td><td>18.55</td></tr><tr><td>11235</td><td>409040</td><td>11-02-2012</td><td>0</td><td>5.08</td><td>5.08</td><td>19.31</td></tr><tr><td>11236</td><td>409040</td><td>12-02-2012</td><td>12.95</td><td>0</td><td>5.08</td><td>6.1</td></tr><tr><td>11237</td><td>409040</td><td>13-02-2012</td><td>5.59</td><td>12.95</td><td>18.03</td><td>18.03</td></tr><tr><td>11238</td><td>409040</td><td>14-02-2012</td><td>0.25</td><td>5.59</td><td>18.54</td><td>23.62</td></tr><tr><td>11239</td><td>409040</td><td>15-02-2012</td><td>0</td><td>0.25</td><td>18.79</td><td>23.87</td></tr><tr><td>11240</td><td>409040</td><td>16-02-2012</td><td>0</td><td>0</td><td>5.84</td><td>23.87</td></tr><tr><td>11241</td><td>409040</td><td>17-02-2012</td><td>0</td><td>0</td><td>0.25</td><td>23.87</td></tr><tr><td>11242</td><td>409040</td><td>18-02-2012</td><td>0</td><td>0</td><td>0</td><td>18.79</td></tr><tr><td>11243</td><td>409040</td><td>19-02-2012</td><td>0</td><td>0</td><td>0</td><td>18.79</td></tr><tr><td>11244</td><td>409040</td><td>20-02-2012</td><td>14.48</td><td>0</td><td>0</td><td>5.84</td></tr><tr><td>11245</td><td>409040</td><td>21-02-2012</td><td>9.65</td><td>14.48</td><td>14.48</td><td>14.73</td></tr><tr><td>11246</td><td>409040</td><td>22-02-2012</td><td>3.05</td><td>9.65</td><td>24.13</td><td>24.13</td></tr><tr><td>11247</td><td>409040</td><td>23-02-2012</td><td>0</td><td>3.05</td><td>27.18</td><td>27.18</td></tr><tr><td>11248</td><td>409040</td><td>24-02-2012</td><td>0</td><td>0</td><td>12.7</td><td>27.18</td></tr><tr><td>11249</td><td>409040</td><td>25-02-2012</td><td>0</td><td>0</td><td>3.05</td><td>27.18</td></tr><tr><td>11250</td><td>409040</td><td>26-02-2012</td><td>0</td><td>0</td><td>0</td><td>27.18</td></tr><tr><td>11251</td><td>409040</td><td>27-02-2012</td><td>0</td><td>0</td><td>0</td><td>27.18</td></tr><tr><td>11252</td><td>409040</td><td>28-02-2012</td><td>7.37</td><td>0</td><td>0</td><td>12.7</td></tr><tr><td>11253</td><td>409040</td><td>29-02-2012</td><td>0</td><td>7.37</td><td>7.37</td><td>10.42</td></tr></tbody></table>

I created an additional column rDate (DateTime) and filled it with this query:
UPDATE Rainfall SET Rainfall.rDate = DateSerial([rYear],[rMonth],[rDay]);
Then your desired result can be achieved with several subqueries, using SUM() for the last two columns:
SELECT r.ID, r.AirportCode, r.rDate, r.RFmm,
(SELECT RFmm FROM Rainfall r1 WHERE r1.AirportCode = r.AirportCode AND r1.rDate = r.rDate-1) AS Yesterday,
(SELECT SUM(RFmm) FROM Rainfall r3 WHERE r3.AirportCode = r.AirportCode AND r3.rDate BETWEEN r.rDate-3 AND r.rDate-1) AS Prev3days,
(SELECT SUM(RFmm) FROM Rainfall r7 WHERE r7.AirportCode = r.AirportCode AND r7.rDate BETWEEN r.rDate-7 AND r.rDate-1) AS PrevWeek
FROM Rainfall r
Make sure AirportCode and rDate are indexed for larger numbers of records.
Result:
+-------+-------------+------------+-------+-----------+-----------+----------+
| ID | AirportCode | rDate | RFmm | Yesterday | Prev3days | PrevWeek |
+-------+-------------+------------+-------+-----------+-----------+----------+
| 11216 | 409040 | 23.01.2012 | 0,51 | | | |
| 11217 | 409040 | 24.01.2012 | 0 | 0,51 | 0,51 | 0,51 |
| 11218 | 409040 | 25.01.2012 | 0 | 0 | 0,51 | 0,51 |
| 11219 | 409040 | 26.01.2012 | 2,03 | 0 | 0,51 | 0,51 |
| 11220 | 409040 | 27.01.2012 | 0 | 2,03 | 2,03 | 2,54 |
| 11221 | 409040 | 28.01.2012 | 0 | 0 | 2,03 | 2,54 |
| 11222 | 409040 | 29.01.2012 | 0 | 0 | 2,03 | 2,54 |
| 11223 | 409040 | 30.01.2012 | 0 | 0 | 0 | 2,54 |
| 11224 | 409040 | 31.01.2012 | 0,25 | 0 | 0 | 2,03 |
| 11225 | 409040 | 01.02.2012 | 0 | 0,25 | 0,25 | 2,28 |
| 11226 | 409040 | 02.02.2012 | 0 | 0 | 0,25 | 2,28 |
| 11227 | 409040 | 03.02.2012 | 4,32 | 0 | 0,25 | 0,25 |
| 11228 | 409040 | 04.02.2012 | 13,21 | 4,32 | 4,32 | 4,57 |
| 11229 | 409040 | 05.02.2012 | 1,02 | 13,21 | 17,53 | 17,78 |
+-------+-------------+------------+-------+-----------+-----------+----------+
Use Nz() to avoid NULL values in the first row.

It appears that you store the day in separate fields (rYear, rMonth, rDay). So, in order to get the date you use the DateSerial function. This means that in order to use the date for a join or where clause, Access must calculate the date for the entire table. You need to store the date in a separate field and index it to avoid the calculation.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pandas: need to create dataframe for weekly search per event occurrence - pandas

Related

Pyspark get rows with max value for a column over a window

How do you control float formatting when using DataFrame.to_markdown in pandas?

Iterate through pandas data frame and replace some strings with numbers

df.replace not having any effect when trying to replace dates in pandas dataframe

How to define a sub query inside SQL statement to be used several times as a table alias?

Categories

Resources