Reshaping column values into rows with Identifier column at the end - pandas

I have measurements for Power related to different sensors i.e A1_Pin, A2_Pin and so on. These measurements are recorded in file as columns. The data is uniquely recorded with timestamps.
df1 = pd.DataFrame({'DateTime': ['12/12/2019', '12/13/2019', '12/14/2019',
'12/15/2019', '12/16/2019'],
'A1_Pin': [2, 8, 8, 3, 9],
'A2_Pin': [1, 2, 3, 4, 5],
'A3_Pin': [85, 36, 78, 32, 75]})
I want to reform the table so that each row corresponds to one sensor. The last column indicates the sensor ID to which the row data belongs to.
The final table should look like:
df2 = pd.DataFrame({'DateTime': ['12/12/2019', '12/12/2019', '12/12/2019',
'12/13/2019', '12/13/2019','12/13/2019', '12/14/2019', '12/14/2019',
'12/14/2019', '12/15/2019','12/15/2019', '12/15/2019', '12/16/2019',
'12/16/2019', '12/16/2019'],
'Power': [2, 1, 85,8, 2, 36, 8,3,78, 3, 4, 32, 9, 5, 75],
'ModID': ['A1_PiN','A2_PiN','A3_PiN','A1_PiN','A2_PiN','A3_PiN',
'A1_PiN','A2_PiN','A3_PiN','A1_PiN','A2_PiN','A3_PiN',
'A1_PiN','A2_PiN','A3_PiN']})
I have tried Groupby, Melt, Reshape, Stack and loops but could not do that. If anyone could help? Thanks

When you tried stack, you were on one good track. you need to set_index first and reset_index after such as:
df2 = df1.set_index('DateTime').stack().reset_index(name='Power')\
.rename(columns={'level_1':'ModID'}) #to fit the names your expected output
And you get:
print (df2)
DateTime ModID Power
0 12/12/2019 A1_Pin 2
1 12/12/2019 A2_Pin 1
2 12/12/2019 A3_Pin 85
3 12/13/2019 A1_Pin 8
4 12/13/2019 A2_Pin 2
5 12/13/2019 A3_Pin 36
6 12/14/2019 A1_Pin 8
7 12/14/2019 A2_Pin 3
8 12/14/2019 A3_Pin 78
9 12/15/2019 A1_Pin 3
10 12/15/2019 A2_Pin 4
11 12/15/2019 A3_Pin 32
12 12/16/2019 A1_Pin 9
13 12/16/2019 A2_Pin 5
14 12/16/2019 A3_Pin 75

I'd try something like this:
df1.set_index('DateTime').unstack().reset_index()

Related

how to extract the list of values from one column in pandas

I wish to extract the list of values from one column in pandas how to extract the list of values from one column and then use those values to create additional columns based on number of values within the list.
My dataframe:
a = pd.DataFrame({"test":["","","","",[1,2,3,4,5,6,6],"","",[11,12,13,14,15,16,17]]})
Current output:
test
0
1
2
3
4 [1, 2, 3, 4, 5, 6, 6]
5
6
7 [11, 12, 13, 14, 15, 16, 17]
expected output:
example_1 example_2 example_3 example_4 example_5 example_6 example_7
0
1
2
3
4 1 2 3 4 5 6 6
5
6 11 12 13 14 15 16 17
lets say we expect it to have 7 values for each list. <- this is most of my current case so if I can set the limit then it will be a good one. Thank you.
This should be what you're looking for. I replaced the nan values with blank cells, but you can change that of course.
a = pd.DataFrame({"test":["","","","",[1,2,3,4,5,6,6],"","",[11,12,13,14,15,16,17]]})
ab = a.test.apply(pd.Series).fillna("")
ab.columns = ['example_' + str(i) for i in range(1, 8)]
Output:
Edit: using .add_prefix() as the other answer uses is prettier than setting the column names manually with a list comprehension.
Here's a one-liner:
(pd.DataFrame
(a.test.apply(pd.Series)
.fillna("")
.set_axis(range(1, a.test.str.len().max() + 1), axis=1)
.add_prefix("example_")
)
The set_axis is just to make the columns 1-indexed, if you don't mind 0-indexing you can leave it out.

rolling windows defined by backward cumulative sums

I have got a pandas DataFrame like this:
A B
0 3 ...
1 2
2 4
3 4
4 1
5 7
6 5
7 3
I would like to compute a rolling along column A summing its elements backwards until I reach at least 10. The resulting windows should be:
A B window_indices
0 3 ... NA
1 2 NA
2 4 NA
3 4 --> [3,2,1]
4 1 [4,3,2,1]
5 7 [5,4,3]
6 5 [6,5]
7 3 [7,6,5]
Next, I want to compute some statistics on column B, something like that:
df.my_rolling(on='A', func='sum', threshold=10).B.mean()
I have got an idea: we could think of the elements of column A as seconds. Transform A in a datetime column and perform a standard rolling on it. But I don't know how to do that.
This is no able to do with rolling since the rolling window is not fixed
l = [[df.index[(df.A.loc[:x].iloc[::-1].cumsum()>=10).idxmax():x+1].tolist()[::-1]
if (df.A.loc[:x].sum()>=10) else np.nan] for x in df.A.index]
Out[46]:
[[nan],
[nan],
[nan],
[[3, 2, 1]],
[[4, 3, 2, 1]],
[[5, 4, 3]],
[[6, 5]],
[[7, 6, 5]]]
df['new'] = l

Find pattern in pandas dataframe, reorder it row-wise, and reset index

This is a multipart problem. I have found solutions for each separate part, but when I try to combine these solutions, I don't get the outcome I want.
Let's say this is my dataframe:
df = pd.DataFrame(list(zip([1, 3, 6, 7, 7, 8, 4], [6, 7, 7, 9, 5, 3, 1])), columns = ['Values', 'Vals'])
df
Values Vals
0 1 6
1 3 7
2 6 7
3 7 9
4 7 5
5 8 3
6 4 1
Let's say I want to find the pattern [6, 7, 7] in the 'Values' column.
I can use a modified version of the second solution given here:
Pandas: How to find a particular pattern in a dataframe column?
pattern = [6, 7, 7]
pat_i = [df[i-len(pattern):i] # Get the index
for i in range(len(pattern), len(df)) # for each 3 consequent elements
if all(df['Values'][i-len(pattern):i] == pattern)] # if the pattern matched
pat_i
[ Values Vals
2 6 7
3 7 9
4 7 5]
The only way I've found to narrow this down to just index values is the following:
pat_i = [df.index[i-len(pattern):i] # Get the index
for i in range(len(pattern), len(df)) # for each 3 consequent elements
if all(df['Values'][i-len(pattern):i] == pattern)] # if the pattern matched
pat_i
[RangeIndex(start=2, stop=5, step=1)]
Once I've found the pattern, what I want to do, within the original dataframe, is reorder the pattern to [7, 7, 6], moving the entire associated rows as I do this. In other words, going by the index, I want to get output that looks like this:
df.reindex([0, 1, 3, 4, 2, 5, 6])
Values Vals
0 1 6
1 3 7
3 7 9
4 7 5
2 6 7
5 8 3
6 4 1
Then, finally, I want to reset the index so that the values in all the columns stay in the new re-ordered place;
Values Vals
0 1 6
1 3 7
2 7 9
3 7 5
4 6 7
5 8 3
6 4 1
In order to use pat_i as a basis for re-ordering, I've tried to modify the second solution given here:
Python Pandas: How to move one row to the first row of a Dataframe?
target_row = 2
# Move target row to first element of list.
idx = [target_row] + [i for i in range(len(df)) if i != target_row]
However, I can't figure out how to exploit the pat_i RangeIndex object to use it with this code. The solution, when I find it, will be applied to hundreds of dataframes, each one of which will contain the [6, 7, 7] pattern that needs to be re-ordered in one place, but not the same place in each dataframe.
Any help appreciated...and I'm sure there must be an elegant, pythonic way of doing this, as it seems like it should be a common enough challenge. Thank you.
I just sort of rewrote your code. I held the first and last indexes to the side, reordered the indexes of interest, and put everything together in a new index. Then I just use the new index to reorder the data.
import pandas as pd
from pandas import RangeIndex
df = pd.DataFrame(list(zip([1, 3, 6, 7, 7, 8, 4], [6, 7, 7, 9, 5, 3, 1])), columns = ['Values', 'Vals'])
pattern = [6, 7, 7]
new_order = [1, 2, 0] # new order of pattern
for i in list(df[df['Values'] == pattern[0]].index):
if all(df['Values'][i:i+len(pattern)] == pattern):
pat_i = df[i:i+len(pattern)]
front_ind = list(range(0, pat_i.index[0]))
back_ind = list(range(pat_i.index[-1]+1, len(df)))
pat_ind = [pat_i.index[i] for i in new_order]
new_ind = front_ind + pat_ind + back_ind
df = df.loc[new_ind].reset_index(drop=True)
df
Out[82]:
Values Vals
0 1 6
1 3 7
2 7 9
3 7 5
4 6 7
5 8 3
6 4 1

How do I aggregate a pandas Dataframe while retaining all original data?

My goal is to aggregate a pandas DataFrame, grouping rows by an identity field. Notably, rather than just gathering summary statistics of the group, I want to retain all the information in the DataFrame in addition to summary statistics like mean, std, etc. I have performed this transformation via a lot of iteration, but I am looking for a cleaner/more pythonic approach. Notably, there may be more or less than 2 replicates per group, but all groups will always have the same number of replicates.
Example: I would llke to translate the below format
df = pd.DataFrame([
["group1", 4, 10],
["group1", 8, 20],
["group2", 6, 30],
["group2", 12, 40],
["group3", 1, 50],
["group3", 3, 60]],
columns=['group','timeA', 'timeB'])
print(df)
group timeA timeB
0 group1 4 10
1 group1 8 20
2 group2 6 30
3 group2 12 40
4 group3 1 50
5 group3 3 60
into a df of the following format:
target = pd.DataFrame([
["group1", 4, 8, 6, 10, 20, 15],
["group2", 6, 12, 9, 30, 45, 35],
["group3", 1, 3, 2, 50, 60, 55]
], columns = ["group", "timeA.1", "timeA.2", "timeA.mean", "timeB.1", "timeB.2", "timeB.mean"])
print(target)
group timeA.1 timeA.2 timeA.mean timeB.1 timeB.2 timeB.mean
0 group1 4 8 6 10 20 15
1 group2 6 12 9 30 45 35
2 group3 1 3 2 50 60 55
Finally, it doesn't really matter what the column names are, these ones are just to make the example more clear. Thanks!
EDIT: As suggested by a user in the comments, I tried the solution from the linked Q/A without success:
df.insert(0, 'count', df.groupby('group').cumcount())
df.pivot(*df)
TypeError: pivot() takes from 1 to 4 positional arguments but 5 were given
Try with pivot_table:
out = (df.assign(col=df.groupby('group').cumcount()+1)
.pivot_table(index='group', columns='col',
margins='mean', margins_name='mean')
.drop('mean')
)
out.columns = [f'{x}.{y}' for x,y in out.columns]
Output:
timeA.1 timeA.2 timeA.mean timeB.1 timeB.2 timeB.mean
group
group1 4.0 8.0 6.0 10 20 15
group2 6.0 12.0 9.0 30 40 35
group3 1.0 3.0 2.0 50 60 55

Pandas DataFrame.update with MultiIndex label

Given a DataFrame A with MultiIndex and a DataFrame B with one-dimensional index, how to update column values of A with new values from B where the index of B should be matched with the second index label of A.
Test data:
begin = [10, 10, 12, 12, 14, 14]
end = [10, 11, 12, 13, 14, 15]
values = [1, 2, 3, 4, 5, 6]
values_updated = [10, 20, 3, 4, 50, 60]
multiindexed = pd.DataFrame({'begin': begin,
'end': end,
'value': values})
multiindexed.set_index(['begin', 'end'], inplace=True)
singleindexed = pd.DataFrame.from_dict(dict(zip([10, 11, 14, 15],
[10, 20, 50, 60])),
orient='index')
singleindexed.columns = ['value']
And the desired result should be
value
begin end
10 10 10
11 20
12 12 3
13 4
14 14 50
15 60
Now I was thinking about a variant of
multiindexed.update(singleindexed)
I searched the docs of DataFrame.update, but could not find anything w.r.t. index handling.
Am I missing an easier way to accomplish this?
You can use loc for selecting data in multiindexed and then set new values by values:
print singleindexed.index
Int64Index([10, 11, 14, 15], dtype='int64')
print singleindexed.values
[[10]
[20]
[50]
[60]]
idx = pd.IndexSlice
print multiindexed.loc[idx[:, singleindexed.index],:]
value
start end
10 10 1
11 2
14 14 5
15 6
multiindexed.loc[idx[:, singleindexed.index],:] = singleindexed.values
print multiindexed
value
start end
10 10 10
11 20
12 12 3
13 4
14 14 50
15 60
Using slicers in docs.