iterate over list of dicts to find minimum values per unique value - python-3.8

I am looking for the minimum of 'diff' per unique 'expiry' in the following list of dicts. Guessing itertools.groupby can help but really lost how to apply that.
import datetime as dt
data = [ {'expiry': dt.datetime(2020,6,26), 'strike': 138.0, 'diff': 0.305},
{'expiry': dt.datetime(2020,6,26), 'strike': 138.5, 'diff': 0.188},
{'expiry': dt.datetime(2020,6,26), 'strike': 139.0, 'diff': 0.688},
{'expiry': dt.datetime(2020,7,24), 'strike': 137.0, 'diff': 0.805},
{'expiry': dt.datetime(2020,7,24), 'strike': 137.5, 'diff': 0.305},
{'expiry': dt.datetime(2020,7,24), 'strike': 138.0, 'diff': 0.203},
{'expiry': dt.datetime(2020,7,24), 'strike': 138.5, 'diff': 0.703}]
desired output:
[{'expiry': dt.datetime(2020,6,26), 'strike': 138.5, 'diff': 0.188},
{'expiry': dt.datetime(2020,7,24), 'strike': 138.0, 'diff': 0.203}]
Looking for an efficient and fast way to solve this (without using pandas).

m = dict()
for d in data:
m.update({
d['expiry']: min(
d['diff'], m[d['expiry']] if d['expiry'] in m else d['diff']
)
})
for exp, min_of_diff in m.items():
print('{},{}'.format(exp,min_of_diff))
or if you want access to the whole dict:
m = dict()
for d in data:
if d['expiry'] not in m:
m.update({
d['expiry']: d
})
continue
# otherwise, get the lesser
smallest_diff = m[d['expiry']]['diff']
if d['diff'] < smallest_diff:
m.update({
d['expiry']: d
})
for exp, min_dict in m.items():
print('{},{}'.format(exp,min_dict))
I haven't tested these solutions.
But that should give you the idea.

Related

plotly imagesc repeat yaxis labels

I have a wide matrix that I render using plotly express. Let's say:
import plotly.express as px
data=[[1, 25, 30, 50, 1], [20, 1, 60, 80, 30], [30, 60, 1, 5, 20]]
fig = px.imshow(data,
labels=dict(x="Day of Week", y="Time of Day", color="Productivity"),
x=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'],
y=['Morning', 'Afternoon', 'Evening']
)
fig.update_xaxes(side="top")
fig.layout.height = 500
fig.layout.width = 500
fig.show()
For enhancing readability, I would like to repeat (or add an identical) yaxis on the right side of the matrix.
I tried to follow this
fig.update_layout(xaxis=dict(domain=[0.3, 0.7]),
# create 1st y axis
yaxis=dict(
title="yaxis1 title",),
# create 2nd y axis
yaxis2=dict(title="yaxis2 title", anchor="x", overlaying="y",
side="right")
)
but I cannot make it work with imshow as it does not accept a yaxis argument.
Any workarounds?
Found an answer through the plotly forum:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])
data=[[1, 25, 30, 50, 1], [20, 1, 60, 80, 30], [30, 60, 1, 5, 20]]
fig.add_trace(go.Heatmap(
z=data,
x=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'],
y=['Morning', 'Afternoon', 'Evening']
),secondary_y=False)
fig.add_trace(go.Heatmap(
z=data,
x=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'],
y=['Morning', 'Afternoon', 'Evening']
),secondary_y=True)
fig.update_xaxes(side="top")
fig.update_layout(xaxis_title="Day of Week", yaxis_title="Time of Day")
fig.show()
Note that adding the trace twice may be suboptimal, but it works.

How do I get API data into a Pandas DataFrame?

I am pulling in betting data from an API and I would like to get it into a DataFrame. I would like the DataFrame to have the following columns: [away_team, home_team, spread, overUnder] I am using the following code:
import cfbd
configuration = cfbd.Configuration()
configuration.api_key['Authorization'] = 'XXX'
configuration.api_key_prefix['Authorization'] = 'Bearer'
from __future__ import print_function
import time
import cfbd
from cfbd.rest import ApiException
from pprint import pprint
# create an instance of the API class
api_instance = cfbd.BettingApi(cfbd.ApiClient(configuration))
game_id = 56 # int | Game id filter (optional)
year = 56 # int | Year/season filter for games (optional)
week = 56 # int | Week filter (optional)
season_type = 'regular' # str | Season type filter (regular or postseason) (optional) (default to regular)
team = 'team_example' # str | Team (optional)
home = 'home_example' # str | Home team filter (optional)
away = 'away_example' # str | Away team filter (optional)
conference = 'conference_example' # str | Conference abbreviation filter (optional)
try:
# Betting lines
api_response = api_instance.get_lines(year=2021, week=7, season_type='regular', conference='SEC')
pprint(api_response)
except ApiException as e:
print("Exception when calling BettingApi->get_lines: %s\n" % e)
API Response:
[{'away_conference': 'SEC',
'away_score': None,
'away_team': 'Auburn',
'home_conference': 'SEC',
'home_score': None,
'home_team': 'Arkansas',
'id': 401282104,
'lines': [{'awayMoneyline': 155,
'formattedSpread': 'Arkansas -3.5',
'homeMoneyline': -180,
'overUnder': '53.5',
'overUnderOpen': '53.0',
'provider': 'Bovada',
'spread': '-3.5',
'spreadOpen': '-3.5'}],
'season': 2021,
'season_type': 'regular',
'start_date': None,
'week': 7},
{'away_conference': 'SEC',
'away_score': None,
'away_team': 'Kentucky',
'home_conference': 'SEC',
'home_score': None,
'home_team': 'Georgia',
'id': 401282105,
'lines': [{'awayMoneyline': 1000,
'formattedSpread': 'Georgia -23.5',
'homeMoneyline': -2200,
'overUnder': '44.5',
'overUnderOpen': '44.5',
'provider': 'Bovada',
'spread': '-23.5',
'spreadOpen': '-23.5'}],
'season': 2021,
'season_type': 'regular',
'start_date': None,
'week': 7},
{'away_conference': 'SEC',
'away_score': None,
'away_team': 'Florida',
'home_conference': 'SEC',
'home_score': None,
'home_team': 'LSU',
'id': 401282106,
'lines': [{'awayMoneyline': -370,
'formattedSpread': 'Florida -10.0',
'homeMoneyline': 285,
'overUnder': '58.5',
'overUnderOpen': '58.0',
'provider': 'Bovada',
'spread': '10.0',
'spreadOpen': '10.0'}],
'season': 2021,
'season_type': 'regular',
'start_date': None,
'week': 7},
{'away_conference': 'SEC',
'away_score': None,
'away_team': 'Alabama',
'home_conference': 'SEC',
'home_score': None,
'home_team': 'Mississippi State',
'id': 401282107,
'lines': [{'awayMoneyline': -950,
'formattedSpread': 'Alabama -17.5',
'homeMoneyline': 600,
'overUnder': '57.5',
'overUnderOpen': '59.0',
'provider': 'Bovada',
'spread': '17.5',
'spreadOpen': '17.0'}],
'season': 2021,
'season_type': 'regular',
'start_date': None,
'week': 7},
{'away_conference': 'SEC',
'away_score': None,
'away_team': 'Texas A&M',
'home_conference': 'SEC',
'home_score': None,
'home_team': 'Missouri',
'id': 401282108,
'lines': [{'awayMoneyline': -310,
'formattedSpread': 'Texas A&M -9.0',
'homeMoneyline': 255,
'overUnder': '60.5',
'overUnderOpen': '61.0',
'provider': 'Bovada',
'spread': '9.0',
'spreadOpen': '9.0'}],
'season': 2021,
'season_type': 'regular',
'start_date': None,
'week': 7},
{'away_conference': 'SEC',
'away_score': None,
'away_team': 'Vanderbilt',
'home_conference': 'SEC',
'home_score': None,
'home_team': 'South Carolina',
'id': 401282109,
'lines': [{'awayMoneyline': 750,
'formattedSpread': 'South Carolina -18.5',
'homeMoneyline': -1400,
'overUnder': '51.0',
'overUnderOpen': '51.0',
'provider': 'Bovada',
'spread': '-18.5',
'spreadOpen': '-20.0'}],
'season': 2021,
'season_type': 'regular',
'start_date': None,
'week': 7},
{'away_conference': 'SEC',
'away_score': None,
'away_team': 'Ole Miss',
'home_conference': 'SEC',
'home_score': None,
'home_team': 'Tennessee',
'id': 401282110,
'lines': [{'awayMoneyline': -150,
'formattedSpread': 'Ole Miss -3.0',
'homeMoneyline': 130,
'overUnder': '80.5',
'overUnderOpen': '78.0',
'provider': 'Bovada',
'spread': '3.0',
'spreadOpen': '3.0'}],
'season': 2021,
'season_type': 'regular',
'start_date': None,
'week': 7}]
I need help getting this output into a DataFrame. Thank you in advance.
You could iterate over the json data, extracting the information that you need and creating a new structure to hold this data. After iterating over all your data, you can create a dataframe from what you extracted.
I made an example using a dataclass to store the data you need:
import json
import pandas as pd
from dataclasses import dataclass
#dataclass
class BettingData:
away_team: str
home_team: str
spread: str
overUnder: str
json_data = json.loads(open('sample_data.json', 'r').read())
content = []
for entry in json_data:
for line in entry['lines']:
data = BettingData(away_team=entry['away_team'],
home_team=entry['home_team'],
spread=line['spread'],
overUnder=line['overUnder'])
content.append(data)
df = pd.DataFrame(content)
print(df)
And the output is:
away_team home_team spread overUnder
0 Auburn Arkansas -3.5 53.5
1 Kentucky Georgia -23.5 44.5
2 Florida LSU 10.0 58.5
3 Alabama Mississippi State 17.5 57.5
4 Texas A&M Missouri 9.0 60.5
5 Vanderbilt South Carolina -18.5 51.0
6 Ole Miss Tennessee 3.0 80.5

create list from values in list dicts based on condition

I am looking to create a list/np.array for each unique expiry date in the following list of dicts:
import datetime as dt
data=[{'expiry': dt.datetime(2020, 6, 26, 21, 0), 'strike': 137.0},
{'expiry': dt.datetime(2020, 6, 26, 21, 0), 'strike': 137.25},
{'expiry': dt.datetime(2020, 6, 26, 21, 0), 'strike': 137.5},
{'expiry': dt.datetime(2020, 7, 24, 21, 0), 'strike': 136.5},
{'expiry': dt.datetime(2020, 7, 24, 21, 0), 'strike': 137.0},
{'expiry': dt.datetime(2020, 7, 24, 21, 0), 'strike': 137.5},
{'expiry': dt.datetime(2020, 7, 24, 21, 0), 'strike': 138.0}]
the unique expiry dates I can get like so:
exp = np.unique([np.array([d['expiry']]) for d in data])
the desired output is:
[[137.0, 137.25, 137.5], [136.5, 137.0, 137.5, 138.0]]
Using your exp:
[[y['strike'] for y in data if y['expiry']==x] for x in exp ]
Output:
[[137.0, 137.25, 137.5], [136.5, 137.0, 137.5, 138.0]]
As an alternative to #AllaTarighati's solution, you can also you the return_inverse option of np.unique:
exp,ind = np.unique([np.array([d['expiry']]) for d in data], return_inverse=True)
strike = [[data[i]['strike'] for i,j in enumerate(ind) if j==k] for k in range(exp.size)]
Here is a solution without any additional comparisons (j==k):
exp,ind = np.unique([np.array([d['expiry']]) for d in data], return_inverse=True)
strike = [[] for _ in range(exp.size)]
for i,j in enumerate(ind):
strike[j].append(data[i]['strike'])
Output of print(strike) for both sample codes is:
[[137.0, 137.25, 137.5], [136.5, 137.0, 137.5, 138.0]]

iterate over unique elements of df.index to find minimum in column

My df looks like this:
import date time as dt
data = [{'expiry': dt.datetime(2020,6,26), 'strike': 137.5, 'diff': 0.797},
{'expiry': dt.datetime(2020,6,26), 'strike': 138.0, 'diff': 0.305},
{'expiry': dt.datetime(2020,6,26), 'strike': 138.5, 'diff': 0.188},
{'expiry': dt.datetime(2020,6,26), 'strike': 139.0, 'diff': 0.688},
{'expiry': dt.datetime(2020,7,24), 'strike': 137.5, 'diff': 0.805},
{'expiry': dt.datetime(2020,7,24), 'strike': 138.0, 'diff': 0.305},
{'expiry': dt.datetime(2020,7,24), 'strike': 138.5, 'diff': 0.203},
{'expiry': dt.datetime(2020,7,24), 'strike': 139.0, 'diff': 0.703}]
df = pd.DataFrame(data).set_index('expiry')
am looking to find the minimum per unique index (expiry). The following works but is rather slow. Looking for a faster way to do this, either in pure python, NumPy or pandas.
atm_df = pd.DataFrame()
for date in df.index.unique():
_df = df.loc[date]
atm_df = atm_df.append(_df.loc[(_df['diff'] == _df['diff'].min())])
atm_df
Desired output looks like this (but don't mind if this is a df or a dict):
strike diff
expiry
2020-06-26 138.5 0.188
2020-07-24 138.5 0.203
min works with level, and then you can use eq to compare the series with the extracted min:
df[df['diff'].eq(df['diff'].min(level=0))]
Output:
strike diff
expiry
2020-06-26 138.5 0.188
2020-07-24 138.5 0.203
One based on np.minimum.reduceat -
sidx = df.index.argsort()
df_s = df.iloc[sidx]
I = df_s.index.values
cutidx = np.flatnonzero(np.r_[True,I[:-1]!=I[1:]])
out = np.minimum.reduceat(df_s.values, cutidx, axis=0)
df_out = pd.DataFrame(out, index=I[cutidx], columns=df_s.columns)
If the input dataframe is already sorted by index, use df as df_s directly.
You can use Pandas groupby on the index and aggregate with min to get the minimum for the diff column. compare the result of the grouping with the values in diff, then index the dataframe with the resulting boolean.
df.loc[df['diff'].eq(df.groupby(level=0)['diff'].min())]
strike diff
expiry
2020-06-26 138.5 0.188
2020-07-24 138.5 0.203
just a learning experience for me - Tried it out in pure python :
from itertools import groupby
from operator import itemgetter
#convert to dict:
m = df.reset_index().to_numpy()
#we'll use itertools groupby
#data is already sorted so I wont bother with that
#groupby requires data to be sorted
#the first item in the sublist, expiry
#will be our grouping key
#this is our expiry value
grp_key = itemgetter(0)
#we need the rows with the minimum for diff
diff_min = itemgetter(-1)
columns = df.reset_index().columns
outcome = [dict(zip(columns, min(value,key=diff_min)))
for key,value
in groupby(m, grp_key)
]
outcome
[{'expiry': Timestamp('2020-06-26 00:00:00'), 'strike': 138.5, 'diff': 0.188},
{'expiry': Timestamp('2020-07-24 00:00:00'), 'strike': 138.5, 'diff': 0.203}]
UPDATE: Thanks #steff for pointing me towards the dictionaries -The computation can be solved there before reading into Pandas, if necessary. We'll use the same steps involving itemgetter and itertools' groupby
#sort data
data = sorted(data, key = itemgetter('expiry'))
outcome = [min(value, key = itemgetter("diff"))
for _,value
in groupby(data,key=itemgetter("expiry"))]
outcome
[{'expiry': datetime.datetime(2020, 6, 26, 0, 0),
'strike': 138.5,
'diff': 0.188},
{'expiry': datetime.datetime(2020, 7, 24, 0, 0),
'strike': 138.5,
'diff': 0.203}]

How to group by a key and sum other keys with Ramda?

Suppose I have an array of objects like this:
[
{'prop_1': 'key_1', 'prop_2': 23, 'prop_3': 45},
{'prop_1': 'key_1', 'prop_2': 56, 'prop_3': 10},
{'prop_1': 'key_2', 'prop_2': 10, 'prop_3': 5},
{'prop_1': 'key_2', 'prop_2': 6, 'prop_3': 7}
]
I would like to group by the first property and sum the values of the other properties, resulting in an array like this:
[
{'prop_1': 'key_1', 'prop_2': 79, 'prop_3': 55},
{'prop_1': 'key_2', 'prop_2': 16, 'prop_3': 12}
]
What is the correct way to do this using Ramda?
I have attempted to use the following:
R.pipe(
R.groupBy(R.prop('prop_1')),
R.values,
R.reduce(R.mergeWith(R.add), {})
)
But this sums also the value of 'prop_1'.
You'll need to map the groups, and then reduce each group. Check the values of the currently merged keys. If the value is a Number add the values. If not just return the 1st value:
const { pipe, groupBy, prop, values, map, reduce, mergeWith, ifElse, is, add, identity } = R
const fn = pipe(
groupBy(prop('prop_1')),
values,
map(reduce(
mergeWith(ifElse(is(Number), add, identity)),
{}
))
)
const data = [{"prop_1":"key_1","prop_2":23,"prop_3":45},{"prop_1":"key_1","prop_2":56,"prop_3":10},{"prop_1":"key_2","prop_2":10,"prop_3":5},{"prop_1":"key_2","prop_2":6,"prop_3":7}]
const result = fn(data)
console.log(result)
<script src="https://cdnjs.cloudflare.com/ajax/libs/ramda/0.27.0/ramda.js"></script>
If and only if you know the keys in advance, you could opt for something "simpler" (of course YMMV):
reduceBy takes a function similar to Array#reduce
Use prop('prop_1') as a "group by" clause
(It should be relatively straightforward to extract the values out of the object to get the final array.)
console.log(
reduceBy
( (acc, obj) =>
({ prop_1: obj.prop_1
, prop_2: acc.prop_2 + obj.prop_2
, prop_3: acc.prop_3 + obj.prop_3
})
// acc initial value
, { prop_1: ''
, prop_2: 0
, prop_3: 0
}
// group by clause
, prop('prop_1')
, data
)
);
<script src="https://cdnjs.cloudflare.com/ajax/libs/ramda/0.27.0/ramda.min.js"></script>
<script>const {reduceBy, prop} = R;</script>
<script>
const data = [{'prop_1': 'key_1', 'prop_2': 23, 'prop_3': 45}, {'prop_1': 'key_1', 'prop_2': 56, 'prop_3': 10}, {'prop_1': 'key_2', 'prop_2': 10, 'prop_3': 5}, {'prop_1': 'key_2', 'prop_2': 6, 'prop_3': 7}];
</script>
This answer is not as simple as that from Ori Drori. And I'm not sure whether that's a good thing. This seems to more closely fit the requirements, especially if "and sum the values of the other properties" is a simplification of actual requirements. This tries to keep the key property and then combine the others based on your function
const data = [
{'prop_1': 'key_1', 'prop_2': 23, 'prop_3': 45},
{'prop_1': 'key_1', 'prop_2': 56, 'prop_3': 10},
{'prop_1': 'key_2', 'prop_2': 10, 'prop_3': 5},
{'prop_1': 'key_2', 'prop_2': 6, 'prop_3': 7}
]
const transform = (key, combine) => pipe (
groupBy (prop (key)),
map (map (omit ([key]))),
map (combine),
toPairs,
map (([key, obj]) => ({prop_1: key, ...obj}))
)
console .log (
transform ('prop_1', reduce(mergeWith (add), {})) (data)
)
.as-console-wrapper {max-height: 100% !important; top: 0}
<script src="//cdnjs.cloudflare.com/ajax/libs/ramda/0.27.0/ramda.js"></script>
<script> const {pipe, groupBy, prop, map, omit, reduce, mergeWith, add, toPairs, lift, merge, head, objOf, last} = R </script>
If you have a fetish for point-free code that last line could be written as
map (lift (merge) (pipe (head, objOf(key)), last))
But as we are already making points of key and obj, I see no reason. Yes, we could change that, but I think it would become pretty ugly code.
There might well be something to be said for a more reduce-like version, where instead of passing such a combine function, we pass something that combines two values as well as a way to get the initial value. That's left as an exercise for later.