appending a list as dataframe row - pandas

I have a list with some counts with 5 elements.
counts = [33, 35, 17, 38, 29]
This counts list is updated with new numbers every day. So I wanted to create a dataframe and append the counts data as a new row every day. Every element of the list should appear in separate column in the dataframe. What i want to do is the following:
df = pd.DataFrame(columns = ['w1', 'w2', 'w3', 'w4', 'w5'])
df = df.append(counts)
but instead of adding counts data as a row, it adds a new column. Any help on how to do this correctly?
Assume counts on day0 is [33, 35, 17, 38, 29] and on day1 is [30, 36, 20, 34, 32], what i want is the following as output:
w1 w2 w3 w4 w5
0 33 35 17 38 29
1 30 36 20 34 32
where index represent the day at which counts were taken. any help?

Appending to DataFrame is possible, but because slow if many rows, better is create list of lists and create DataFrame by contructor:
counts = [33, 35, 17, 38, 29]
counts1 = [37, 8, 1, 2, 0]
L = [counts, counts1]
df = pd.DataFrame(L, columns = ['w1', 'w2', 'w3', 'w4', 'w5'])
print (df)
w1 w2 w3 w4 w5
0 33 35 17 38 29
1 37 8 1 2 0
But if need it, e.g. appenf one row daily only, then is necessary create Series with same index values like columns:
df = pd.DataFrame(columns = ['w1', 'w2', 'w3', 'w4', 'w5'])
counts = [33, 35, 17, 38, 29]
s = pd.Series(counts, index=df.columns)
df = df.append(s, ignore_index=True)
print (df)
w1 w2 w3 w4 w5
0 33 35 17 38 29

If you know day0 and day1 in the beginning, then you want to construct it as jezrael suggests.
But I'm assuming that you want to be able to add a new row in a loop.
Solution with loc
When using loc you need to use an index value that doesn't already exist. In this case, I'm assuming that we are maintaining a generic RangeIndex. If that is the case, I'll assume that the next index is the same as the length of the current DataFrame.
df.loc[len(df), :] = counts
df
w1 w2 w3 w4 w5
0 33 35 17 38 29
Let's make the loop.
day0 = [33, 35, 17, 38, 29]
day1 = [30, 36, 20, 34, 32]
for counts in [day0, day1]:
df.loc[len(df), :] = counts
df
w1 w2 w3 w4 w5
0 33.0 35.0 17.0 38.0 29.0
1 30.0 36.0 20.0 34.0 32.0

ran_list =[]
for i in list(range(0,12)):
ran_list.append(round(random.uniform(133.33, 266.66),3))
print(ran_list)
df = pd.DataFrame([ran_list])
df

Related

Create column with values only for some multiindex in pandas

I have a dataframe like this:
df = pd.DataFrame(np.random.randint(50, size=(4, 4),
index=[['a', 'a', 'b', 'b'], [800, 900, 800, 900]],
columns=['X', 'Y', 'r_value', 'z_value'])
df.index.names = ["dat", "recor"]
X Y r_value z_value
dat recor
a 800 14 28 12 18
900 47 34 59 49
b 800 33 18 24 33
900 18 25 44 19
...
I want to apply a function to create a new column based on r_value that gives values only for the case of recor==900, so, in the end I would like something like:
X Y r_value z_value BB
dat recor
a 800 14 28 12 18 NaN
900 47 34 59 49 0
b 800 33 18 24 33 NaN
900 18 25 44 19 2
...
I have created the function like:
x = df.loc[pd.IndexSlice[:,900], "r_value"]
conditions = [x >=70, np.logical_and(x >= 40, x < 70), \
np.logical_and(x >= 10, x < 40), x <10]
choices = [0, 1, 2, 3]
BB = np.select(conditions, choices)
So now I need to append BB as a column, filling with NaNs the rows corresponding to recor==800. How can I do it? I have tried a couple of ideas (not commented here) without result. Thx.
Try
df.loc[df.index.get_level_values('recor')==900, 'BB'] = BB
the part df.index.get_level_values('recor')==900 creates a boolean array with True where the index level "recor" equals 900
indexing using a columns that does not already exist, ie "BB" creates that new column.
The rest of the column should automatically be filled with NaN.
I cant test it since you didn't include a minimal reproducible example.

Using pandas, how to join two tables on variable index?

There are two tables, the entries may have different id type. I need to join two tables based on id_type of df1 and the correct column of df2. For the background of the problem, the ids are security id in financial world, the id type may be CUSIP, ISIN, RIC etc..
print(df1)
id id_type value
0 11 type_A 0.1
1 22 type_B 0.2
2 13 type_A 0.3
print(df2)
type_A type_B type_C
0 11 21 xx
1 12 22 yy
2 13 23 zz
The desired output is
type_A type_B type_C value
0 11 21 xx 0.1
1 12 22 yy 0.2
2 13 23 zz 0.3
Here is an alternative approach, which generalizes to many security types (CUSIP, ISIN, RIC, SEDOL, etc.).
First, create df1 and df2 along the lines of the original example:
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'sec_id': [11, 22, 33],
'sec_id_type': ['CUSIP', 'ISIN', 'RIC'],
'value': [100, 200, 300]})
df2 = pd.DataFrame({'CUSIP': [11, 21, 31],
'ISIN': [21, 22, 23],
'RIC': [31, 32, 33],
'SEDOL': [41, 42, 43]})
Second, create an intermediate data frame x1. We will use the first column for one join, and the second and third columns for a different join:
index = [idx for idx in df2.index for _ in df2.columns]
sec_id_types = df2.columns.to_list() * df2.shape[0]
sec_ids = df2.values.ravel()
data = [
(idx, sec_id_type, sec_id)
for idx, sec_id_type, sec_id in zip(index, sec_id_types, sec_ids)
]
x1 = pd.DataFrame.from_records(data, columns=['index', 'sec_id_type', 'sec_id'])
Join df1 and x1 to extract values from df1:
x2 = (x1.merge(df1, on=['sec_id_type', 'sec_id'], how='left')
.dropna()
.set_index('index'))
Finally, join df2 and x1 (from previous step) to get final result
print(df2.merge(x2, left_index=True, right_index=True, how='left'))
CUSIP ISIN RIC SEDOL sec_id_type sec_id value
0 11 21 31 41 CUSIP 11 100.0
1 21 22 32 42 ISIN 22 200.0
2 31 23 33 43 RIC 33 300.0
The columns sec_id_type and sec_id show the joins work as expected.
NEW Solution 1: create a temporary column that determines the ID with np.where
df2['id'] = np.where(df2['type_A'] == df1['id'], df2['type_A'], df2['type_B'])
df = pd.merge(df2,df1[['id','value']],how='left',on='id').drop('id', axis=1)
NEW Solution 2: Can you simply merge on the index? If not go with solution #1.
df = pd.merge(df2, df1['value'], how ='left', left_index=True, right_index=True)
output:
type_A type_B type_C value
0 11 21 xx 0.1
1 12 22 yy 0.2
2 13 23 zz 0.3
OLD Solution:
Through a combination of pd.merge, pd.melt and pd.concat, I found a solution, although I wonder if there is a shorter way (probably):
df_A_B = pd.merge(df2[['type_A']], df2[['type_B']], how='left', left_index=True, right_index=True) \
.melt(var_name = 'id_type', value_name='id')
df_C = pd.concat([df2[['type_C']]] * 2).reset_index(drop=True)
df_A_B_C = pd.merge(df_A_B, df_C, how='left', left_index=True, right_index=True)
df3 = pd.merge(df_A_B_C, df1, how='left', on=['id_type', 'id']).dropna().drop(['id_type', 'id'], axis=1)
df4 = pd.merge(df2, df3, how='left', on=['type_C'])
df4
output:
type_A type_B type_C value
0 11 21 xx 0.1
1 12 22 yy 0.2
2 13 23 zz 0.3

Reshaping column values into rows with Identifier column at the end

I have measurements for Power related to different sensors i.e A1_Pin, A2_Pin and so on. These measurements are recorded in file as columns. The data is uniquely recorded with timestamps.
df1 = pd.DataFrame({'DateTime': ['12/12/2019', '12/13/2019', '12/14/2019',
'12/15/2019', '12/16/2019'],
'A1_Pin': [2, 8, 8, 3, 9],
'A2_Pin': [1, 2, 3, 4, 5],
'A3_Pin': [85, 36, 78, 32, 75]})
I want to reform the table so that each row corresponds to one sensor. The last column indicates the sensor ID to which the row data belongs to.
The final table should look like:
df2 = pd.DataFrame({'DateTime': ['12/12/2019', '12/12/2019', '12/12/2019',
'12/13/2019', '12/13/2019','12/13/2019', '12/14/2019', '12/14/2019',
'12/14/2019', '12/15/2019','12/15/2019', '12/15/2019', '12/16/2019',
'12/16/2019', '12/16/2019'],
'Power': [2, 1, 85,8, 2, 36, 8,3,78, 3, 4, 32, 9, 5, 75],
'ModID': ['A1_PiN','A2_PiN','A3_PiN','A1_PiN','A2_PiN','A3_PiN',
'A1_PiN','A2_PiN','A3_PiN','A1_PiN','A2_PiN','A3_PiN',
'A1_PiN','A2_PiN','A3_PiN']})
I have tried Groupby, Melt, Reshape, Stack and loops but could not do that. If anyone could help? Thanks
When you tried stack, you were on one good track. you need to set_index first and reset_index after such as:
df2 = df1.set_index('DateTime').stack().reset_index(name='Power')\
.rename(columns={'level_1':'ModID'}) #to fit the names your expected output
And you get:
print (df2)
DateTime ModID Power
0 12/12/2019 A1_Pin 2
1 12/12/2019 A2_Pin 1
2 12/12/2019 A3_Pin 85
3 12/13/2019 A1_Pin 8
4 12/13/2019 A2_Pin 2
5 12/13/2019 A3_Pin 36
6 12/14/2019 A1_Pin 8
7 12/14/2019 A2_Pin 3
8 12/14/2019 A3_Pin 78
9 12/15/2019 A1_Pin 3
10 12/15/2019 A2_Pin 4
11 12/15/2019 A3_Pin 32
12 12/16/2019 A1_Pin 9
13 12/16/2019 A2_Pin 5
14 12/16/2019 A3_Pin 75
I'd try something like this:
df1.set_index('DateTime').unstack().reset_index()

plotting histograms in pandas

I am looking to plot to sets of data each with 10 points in them in overlapping bins.
values1 = [29, 31, 24, 30, 30, 14, 25, 35, 27, 31]
values2 = [36, 29, 29, 29, 34, 33, 27, 34, 36, 39]
When I add them to a dataframe they come out as 2 columns.
i am looking to plot 2 rows each with 10 overlapping columns.
df1 = pd.DataFrame(values1, values2)
and subsequently when I plot them using histograms they do not come out correctly
df1.plot.hist(stacked = True)
plt.show()
So my aim is to do a pairwise comparison between each of the numbers in the arrays. 29 - 36 , 31 - 29 , 24 - 29 etc.
I would like to plot them so that they overlap as this example
http://pandas.pydata.org/pandas-docs/stable/_images/hist_new_stacked.png
however I have only to values instead of three as in example.
You can pass them as values to a dict:
values1 = [29, 31, 24, 30, 30, 14, 25, 35, 27, 31]
values2 = [36, 29, 29, 29, 34, 33, 27, 34, 36, 39]
df1 = pd.DataFrame({'values1':values1, 'values2':values2})
df1.plot.hist(stacked = True)
What you did caused the ctor to interpret the passed values as a single column of data and then the index values:
pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
Compare the difference:
In [166]:
pd.DataFrame(values1, values2)
Out[166]:
0
36 29
29 31
29 24
29 30
34 30
33 14
27 25
34 35
36 27
39 31
In [167]:
pd.DataFrame({'values1':values1, 'values2':values2})
Out[167]:
values1 values2
0 29 36
1 31 29
2 24 29
3 30 29
4 30 34
5 14 33
6 25 27
7 35 34
8 27 36
9 31 39

Pandas DataFrame.update with MultiIndex label

Given a DataFrame A with MultiIndex and a DataFrame B with one-dimensional index, how to update column values of A with new values from B where the index of B should be matched with the second index label of A.
Test data:
begin = [10, 10, 12, 12, 14, 14]
end = [10, 11, 12, 13, 14, 15]
values = [1, 2, 3, 4, 5, 6]
values_updated = [10, 20, 3, 4, 50, 60]
multiindexed = pd.DataFrame({'begin': begin,
'end': end,
'value': values})
multiindexed.set_index(['begin', 'end'], inplace=True)
singleindexed = pd.DataFrame.from_dict(dict(zip([10, 11, 14, 15],
[10, 20, 50, 60])),
orient='index')
singleindexed.columns = ['value']
And the desired result should be
value
begin end
10 10 10
11 20
12 12 3
13 4
14 14 50
15 60
Now I was thinking about a variant of
multiindexed.update(singleindexed)
I searched the docs of DataFrame.update, but could not find anything w.r.t. index handling.
Am I missing an easier way to accomplish this?
You can use loc for selecting data in multiindexed and then set new values by values:
print singleindexed.index
Int64Index([10, 11, 14, 15], dtype='int64')
print singleindexed.values
[[10]
[20]
[50]
[60]]
idx = pd.IndexSlice
print multiindexed.loc[idx[:, singleindexed.index],:]
value
start end
10 10 1
11 2
14 14 5
15 6
multiindexed.loc[idx[:, singleindexed.index],:] = singleindexed.values
print multiindexed
value
start end
10 10 10
11 20
12 12 3
13 4
14 14 50
15 60
Using slicers in docs.