How to create a pandas dataframe from csv where one column contains nested dictionary? - pandas

I have a CSV file and in one column there is a nested dictionary with the values of classification report, in a format like this one:
{'A': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 60},
'B': {'precision': 0.42, 'recall': 0.09, 'f1-score': 0.14, 'support': 150},
'micro avg': {'precision': 0.31, 'recall': 0.31, 'f1-score': 0.31, 'support': 1710},
'macro avg': {'precision': 0.13, 'recall': 0.08, 'f1-score': 0.071, 'support': 1710},
'weighted avg': {'precision': 0.29, 'recall': 0.31, 'f1-score': 0.26, 'support': 1710}}
I would like to get key_value1_level as a column in a data frame. So, is it possible to get the following result?
A_precision A_recall ...weighted_avg_precision weighted_avg_recall weighted_avg_f1-score weighted avg_support
0.0 0.0 0.29 0.31 0.26 1710
Thank you

You can use pd.json_normalize on that dictionary:
dct = {
"A": {"precision": 0.0, "recall": 0.0, "f1-score": 0.0, "support": 60},
"B": {"precision": 0.42, "recall": 0.09, "f1-score": 0.14, "support": 150},
"micro avg": {
"precision": 0.31,
"recall": 0.31,
"f1-score": 0.31,
"support": 1710,
},
"macro avg": {
"precision": 0.13,
"recall": 0.08,
"f1-score": 0.071,
"support": 1710,
},
"weighted avg": {
"precision": 0.29,
"recall": 0.31,
"f1-score": 0.26,
"support": 1710,
},
}
df = pd.json_normalize(dct, sep="_")
print(df)
Prints:
A_precision A_recall A_f1-score A_support B_precision B_recall B_f1-score B_support micro avg_precision micro avg_recall micro avg_f1-score micro avg_support macro avg_precision macro avg_recall macro avg_f1-score macro avg_support weighted avg_precision weighted avg_recall weighted avg_f1-score weighted avg_support
0 0.0 0.0 0.0 60 0.42 0.09 0.14 150 0.31 0.31 0.31 1710 0.13 0.08 0.071 1710 0.29 0.31 0.26 1710

Related

How to Stratify pandas DataFrame based on two columns?

I have the following pandas DataFrame:
account_num = [
1726905620833, 1727875510892, 1727925550921, 1727925575731, 1727345507414,
1713565531401, 1725735509119, 1727925546516, 1727925523656, 1727875509665,
1727875504742, 1727345504314, 1725475539855, 1791725523833, 1727925583805,
1727925544791, 1727925518810, 1727925606986, 1727925618602, 1727605517337,
1727605517354, 1727925583101, 1727925583201, 1727925583335, 1727025517810,
1727935718602]
total_due = [
1662.87, 3233.73, 3992.05, 10469.28, 799.01, 2292.98, 297.07, 5699.06, 1309.82,
1109.67, 4830.57, 3170.12, 45329.73, 46.71, 11981.58, 3246.31, 3214.25, 2056.82,
1611.73, 5386.16, 2622.02, 5011.02, 6222.10, 16340.90, 1239.23, 1198.98]
net_returned = [
0.0, 0.0, 0.0, 2762.64, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12008.27,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2762.69, 0.0, 0.0, 0.0, 9254.66, 0.0, 0.0]
total_fees = [
0.0, 0.0, 0.0, 607.78, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2161.49, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 536.51, 0.0, 0.0, 0.0, 1712.11, 0.0, 0.0]
year = [2021, 2022, 2022, 2021, 2021, 2020, 2020, 2022, 2019, 2019, 2020, 2022, 2019,
2018, 2018, 2022, 2021, 2022, 2022, 2020, 2019, 2019, 2022, 2019, 2021, 2022]
flipped = [1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0]
proba = [
0.960085, 0.022535, 0.013746, 0.025833, 0.076159, 0.788912, 0.052489, 0.035279,
0.019701, 0.552127, 0.063949, 0.061279, 0.024398, 0.902681, 0.009441, 0.015342,
0.006832, 0.032988, 0.031879, 0.026412, 0.025159, 0.023195, 0.022104, 0.021285,
0.026480, 0.025837]
d = {
"account_num" : account_num,
"total_due" : total_due,
"net_returned" : net_returned,
"total_fees" : total_fees,
"year" : year,
"flipped" : flipped,
"proba" : proba
}
df = pd.DataFrame(data=d)
I want to sample the DataFrame by the "year" column according to a specific ratio for each year, which I have successfully done with the following code:
df_fractions = pd.DataFrame({"2018": [0.5], "2019": [0.5], "2020": [1.0], "2021": [0.8],
"2022": [0.7]})
df.year = df.year.astype(str)
grouped = df.groupby("year")
df_training = grouped.apply(lambda x: x.sample(frac=df_fractions[x.name]))
df_training = df_training.reset_index(drop=True)
However, when I invoke sample(), I also want to ensure the samples from each year are stratified according to the number of flipped accounts in that year. So, I want to stratify the per-year samples based on the flipped column. With this small, toy DataFrame, after sampling per year, the ratio of flipped per year are pretty good with respect to the original proportions. But this is not true for a really large DataFrame with close to 300K accounts.
So, that's really my question to all you Python experts: is there a better way to solve this problem than the solution I came up with?

Reindex pandas DataFrame to match index with another DataFrame

I have two pandas DataFrames with different (float) indices.
I want to update the second dataframe to match the first dataframe's index, updating its values to be interpolated using the index.
This is the code I have:
from pandas import DataFrame
df1 = DataFrame([
{'time': 0.2, 'v': 1},
{'time': 0.4, 'v': 2},
{'time': 0.6, 'v': 3},
{'time': 0.8, 'v': 4},
{'time': 1.0, 'v': 5},
{'time': 1.2, 'v': 6},
{'time': 1.4, 'v': 7},
{'time': 1.6, 'v': 8},
{'time': 1.8, 'v': 9},
{'time': 2.0, 'v': 10}
]).set_index('time')
df2 = DataFrame([
{'time': 0.25, 'v': 1},
{'time': 0.5, 'v': 2},
{'time': 0.75, 'v': 3},
{'time': 1.0, 'v': 4},
{'time': 1.25, 'v': 5},
{'time': 1.5, 'v': 6},
{'time': 1.75, 'v': 7},
{'time': 2.0, 'v': 8},
{'time': 2.25, 'v': 9}
]).set_index('time')
df2 = df2.reindex(df1.index.union(df2.index)).interpolate(method='index').reindex(df1.index)
print(df2)
Output:
v
time
0.2 NaN
0.4 1.6
0.6 2.4
0.8 3.2
1.0 4.0
1.2 4.8
1.4 5.6
1.6 6.4
1.8 7.2
2.0 8.0
That's correct and as I need - however it seems a more complicated statement than it needs to be.
If there a more concise way to do the same, requiring fewer intermediate steps?
Also, is there a way to both interpolate and extrapolate? For example, in the example data above, the linearly extrapolated value for index 0.2 could be 0.8 instead of NaN. I know I could curve_fit, but again I feel that's more complicated that it may need to be?
One idea with numpy.interp, if values in both indices are increased and processing only one column v:
df1['v1'] = np.interp(df1.index, df2.index, df2['v'])
print(df1)
v v1
time
0.2 1 1.0
0.4 2 1.6
0.6 3 2.4
0.8 4 3.2
1.0 5 4.0
1.2 6 4.8
1.4 7 5.6
1.6 8 6.4
1.8 9 7.2
2.0 10 8.0

Assign values to multicolumn dataframe using another dataframe

I am trying to assign values to a multicolumn dataframe that are stored in another normal dataframe. The 2 dataframes share the same index, however when attempting to assign the values for all columns of the normal dataframe to a slice of the multicolumn dataframe Nan values appear.
MWE
import pandas as pd
df = pd.DataFrame.from_dict(
{
("old", "mean"): {"high": 0.0, "med": 0.0, "low": 0.0},
("old", "std"): {"high": 0.0, "med": 0.0, "low": 0.0},
("new", "mean"): {"high": 0.0, "med": 0.0, "low": 0.0},
("new", "std"): {"high": 0.0, "med": 0.0, "low": 0.0},
}
)
temp = pd.DataFrame.from_dict(
{
"old": {
"high": 2.6798302797288174,
"med": 10.546654056177656,
"low": 16.46382603916123,
},
"new": {
"high": 15.91881231611413,
"med": 16.671967271277495,
"low": 26.17872356316402,
},
}
)
df.loc[:, (slice(None), "mean")] = temp
print(df)
Output:
old new
mean std mean std
high NaN 0.0 NaN 0.0
med NaN 0.0 NaN 0.0
low NaN 0.0 NaN 0.0
Is this expected behaviour or am I doing something horrible that I am not supposed?
Create MultiIndex in temp for align data and then you can set new values by DataFrame.update:
temp.columns = pd.MultiIndex.from_product([temp.columns, ['mean']])
print (temp)
old new
mean mean
high 2.679830 15.918812
med 10.546654 16.671967
low 16.463826 26.178724
df.update(temp)
print(df)
old new
mean std mean std
high 2.679830 0.0 15.918812 0.0
med 10.546654 0.0 16.671967 0.0
low 16.463826 0.0 26.178724 0.0

How to implement bezier function in react native animation?

i want to implement a bezier curve animation which is provided by easing in react native but the docs are not very clear about how to implement it. please need your suggestion
Here on this repository you can see some examples of the use of react-native-easing:
react-native-easing
Here's the file on the repository:
import { Easing } from 'react-native';
export default {
step0: Easing.step0,
step1: Easing.step1,
linear: Easing.linear,
ease: Easing.ease,
quad: Easing.quad,
cubic: Easing.cubic,
poly: Easing.poly,
sin: Easing.sin,
circle: Easing.circle,
exp: Easing.exp,
elastic: Easing.elastic,
back: Easing.back,
bounce: Easing.bounce,
bezier: Easing.bezier,
in: Easing.in,
out: Easing.out,
inOut: Easing.inOut,
easeIn: Easing.bezier(0.42, 0, 1, 1),
easeOut: Easing.bezier(0, 0, 0.58, 1),
easeInOut: Easing.bezier(0.42, 0, 0.58, 1),
easeInCubic: Easing.bezier(0.55, 0.055, 0.675, 0.19),
easeOutCubic: Easing.bezier(0.215, 0.61, 0.355, 1.0),
easeInOutCubic: Easing.bezier(0.645, 0.045, 0.355, 1.0),
easeInCirc: Easing.bezier(0.6, 0.04, 0.98, 0.335),
easeOutCirc: Easing.bezier(0.075, 0.82, 0.165, 1.0),
easeInOutCirc: Easing.bezier(0.785, 0.135, 0.15, 0.86),
easeInExpo: Easing.bezier(0.95, 0.05, 0.795, 0.035),
easeOutExpo: Easing.bezier(0.19, 1.0, 0.22, 1.0),
easeInOutExpo: Easing.bezier(1.0, 0.0, 0.0, 1.0),
easeInQuad: Easing.bezier(0.55, 0.085, 0.68, 0.53),
easeOutQuad: Easing.bezier(0.25, 0.46, 0.45, 0.94),
easeInOutQuad: Easing.bezier(0.455, 0.03, 0.515, 0.955),
easeInQuart: Easing.bezier(0.895, 0.03, 0.685, 0.22),
easeOutQuart: Easing.bezier(0.165, 0.84, 0.44, 1.0),
easeInOutQuart: Easing.bezier(0.77, 0.0, 0.175, 1.0),
easeInQuint: Easing.bezier(0.755, 0.05, 0.855, 0.06),
easeOutQuint: Easing.bezier(0.23, 1.0, 0.32, 1.0),
easeInOutQuint: Easing.bezier(0.86, 0.0, 0.07, 1.0),
easeInSine: Easing.bezier(0.47, 0.0, 0.745, 0.715),
easeOutSine: Easing.bezier(0.39, 0.575, 0.565, 1.0),
easeInOutSine: Easing.bezier(0.445, 0.05, 0.55, 0.95),
easeInBack: Easing.bezier(0.6, -0.28, 0.735, 0.045),
easeOutBack: Easing.bezier(0.175, 0.885, 0.32, 1.275),
easeInOutBack: Easing.bezier(0.68, -0.55, 0.265, 1.55),
easeInElastic: Easing.out(Easing.elastic(2)),
easeInElasticCustom: (bounciness = 2) => Easing.out(Easing.elastic(bounciness)),
easeOutElastic: Easing.in(Easing.elastic(2)),
easeOutElasticCustom: (bounciness = 2) => Easing.in(Easing.elastic(bounciness)),
easeInOutElastic: Easing.inOut(Easing.out(Easing.elastic(2))),
easeInOutElasticCustom: (bounciness = 2) => Easing.inOut(Easing.out(Easing.elastic(bounciness))),
easeInBounce: Easing.out(Easing.bounce),
easeOutBounce: Easing.in(Easing.bounce),
easeInOutBounce: Easing.inOut(Easing.out(Easing.bounce)),
};
And here's what each function generates:

How to convert a Numpy array (Rows x Cols) to an array of XYZ coordinates?

I have an input array from a camera (greyscale image) that looks like:
[
[0.5, 0.75, 0.1, 0.6],
[0.3, 0.75, 1.0, 0.9]
]
actual size = 434x512
I need an output which is a list of XYZ coordinates:
i.e. [[x,y,z],[x,y,z],...]
[[0,0,0.5],[1,0,0.75],[2,0,0.1],[3,0,0.6],[0,1,0.3],[1,1,0.75],[2,1,1.0],[3,1,0.9]]
Are there any efficient ways to do this using Numpy?
Here's an approach -
m,n = a.shape
R,C = np.mgrid[:m,:n]
out = np.column_stack((C.ravel(),R.ravel(), a.ravel()))
Sample run -
In [45]: a
Out[45]:
array([[ 0.5 , 0.75, 0.1 , 0.6 ],
[ 0.3 , 0.75, 1. , 0.9 ]])
In [46]: m,n = a.shape
...: R,C = np.mgrid[:m,:n]
...: out = np.column_stack((C.ravel(),R.ravel(), a.ravel()))
...:
In [47]: out
Out[47]:
array([[ 0. , 0. , 0.5 ],
[ 1. , 0. , 0.75],
[ 2. , 0. , 0.1 ],
[ 3. , 0. , 0.6 ],
[ 0. , 1. , 0.3 ],
[ 1. , 1. , 0.75],
[ 2. , 1. , 1. ],
[ 3. , 1. , 0.9 ]])
In [48]: out.tolist() # Convert to list of lists if needed
Out[48]:
[[0.0, 0.0, 0.5],
[1.0, 0.0, 0.75],
[2.0, 0.0, 0.1],
[3.0, 0.0, 0.6],
[0.0, 1.0, 0.3],
[1.0, 1.0, 0.75],
[2.0, 1.0, 1.0],
[3.0, 1.0, 0.9]]