Select a specific row from a multiindex dataframe in pandas - pandas

I would like to select the last row from a multiindex dataframe and append to a dict of buy and sell signals. For example, given the multiindex dataframe below:
enter image description here
I would like to select the last row indexed (HK.00700 and 2022-06-28 10:39:00), and add to the dict as follows while keeping the last row's multiindices:
enter image description here
The indices in the second pic are slightly different, but the idea is the same.

Reproduce your data
level = [['HK.00700'],[pd.Timestamp('2022-06-28 10:38:00'),pd.Timestamp('2022-06-28 10:39:00')]]
level_index = pd.MultiIndex.from_product(level, names=['code','time_key'])
transaction = {
'open':[360.6, 360.8],
'close':[360.6, 361.4],
'high':[360.8, 361.4],
'low':[360.4, 360.4],
'volume':[72500, 116300],
'upper_band':[360.906089, 361.180835],
'lower_band':[357.873911, 357.719165]
}
df = pd.DataFrame(data=transaction, index=level_index)
df
It is easy if you only want to select the last row,
df.tail(1)
Turn it into dict
df.tail(1).reset_index().loc[0].to_dict()
### Output
{'code': 'HK.00700',
'time_key': Timestamp('2022-06-28 10:39:00'),
'open': 360.8,
'close': 361.4,
'high': 361.4,
'low': 360.4,
'volume': 116300,
'upper_band': 361.180835,
'lower_band': 357.719165}

Related

Convert a column to a list of prevoius columns in a Dataframe

I would like to create a column that is the form of a list of values from two previous columns, such as location that is made up of the long and lat columns.
This is what the DataFrame looks like
You can create a new columne based on other columns using zip, as follows:
import pandas as pd
df = pd.DataFrame({
'admin_port': ['NORTH SHIELDS', 'OBAN'],
'longitude': [-1.447104, -5.473469],
'latitude': [55.008766, 54.415695],
})
df['new'] = pd.Series(list(zip(df['longitude'].values, df['latitude'].values)))
print(df)
>>> df
admin_port longitude latitude new
0 NORTH SHIELDS -1.447104 55.008766 (-1.447104, 55.008766)
1 OBAN -5.473469 54.415695 (-5.473469, 54.415695)
For your information, you can see how to use zip() here: https://www.w3schools.com/python/ref_func_zip.asp

Pandas Dataframe Flatten values to cell based on column value

I have a simple data frame that I am reading in from Excel. To process further, I need to combine the "store" values into a list in one cell that corresponds to the given zone. To clarify, I want to have only one row per zone. In the corresponding "store" column will be a list of all the corresponding stores in one cell.
Current State
Desired State
I have tried to implement melt with no success.
store_df = pd.read_excel("Zones_by_Store.xlsx")
store_df.groupby(store_df['Price Zone Name'])
pd.melt(store_df, id_vars=['Price Zone Name'], value_vars=['Store No'])
store_df.to_csv('Stores.csv')
Try this:
import pandas as pd
price_zone = ['CA2', 'CA2', 'CA2']
store_num = [112, 162, 726]
df = pd.DataFrame(price_zone, columns=['Price Zone'])
df['Store No'] = store_num
df = (df.groupby(['Price Zone']).agg({'Store No': lambda x: x.tolist()}).reset_index())
print(df)

Numpy where perform multiple actions

I have two dataframe columns where I want to check if the element of one are inside the other one. I perform this using the pandas isin method.
However, if the element is present in the second dataframe, I also want to subtract is from both:
attivo['S'] = np.where(attivo['SKU'].isin(stampate['SKU-S']), attivo['S'] - 1, attivo['S'])
In this example, if an item in the column S of attivo dataframe is present in the column SKU-S of the stampate dataframe, the S column will decrease by one unit, however, I also want that the same column S will decrease in the stampate dataframe.
How is it possible to achieve this?
EDIT with sample data:
df1 = pd.DataFrame({'SKU': 'productSKU', 'S': 5}, index=[0])
df2 = pd.DataFrame({'SKU-S': 'productSKU', 'S': 5}, index=[0])
Currently, I am achieving this:
df1['S'] = np.where(df1['SKU'].isin(df2['SKU-S']), df1['S'] - 1, df1['S'])
However, I would like that both dataframes are updated, in this case, both of them will display 4 in the S column.
IIUC:
s = df1['SKU'].isin(df2['SKU-S'])
# modify df1
df1['S'] -= s
# count the SKU in df1 that belongs to df2 by values
counts = df1['SKU'].where(s).value_counts()
# modify df2
df2['S'] -= df2['SKU-S'].map(counts).fillna(0)

concat series onto dataframe with column name

I want to add a Series (s) to a Pandas DataFrame (df) as a new column. The series has more values than there are rows in the dataframe, so I am using the concat method along axis 1.
df = pd.concat((df, s), axis=1)
This works, but the new column of the dataframe representing the series is given an arbitrary numerical column name, and I would like this column to have a specific name instead.
Is there a way to add a series to a dataframe, when the series is longer than the rows of the dataframe, and with a specified column name in the resulting dataframe?
You can try Series.rename:
df = pd.concat((df, s.rename('col')), axis=1)
One option is simply to specify the name when creating the series:
example_scores = pd.Series([1,2,3,4], index=['t1', 't2', 't3', 't4'], name='example_scores')
Using the name attribute when creating the series is all I needed.
Try:
df = pd.concat((df, s.rename('CoolColumnName')), axis=1)

Python 3: Creating DataFrame with parsed data

The following data has been parsed from a stock API. The dataframe has the headers of each column in the Dataset respectively. Is there anyway I can link the data to the dataframe effectively creating a labeled data array/table?
DataFrame
df = pd.DataFrame(columns=['Date','Close','High','Low','Open','Volume'])
DataSet
20140502,36.8700,37.1200,36.2100,36.5900,22454100
20140505,36.9100,37.0500,36.3000,36.6800,13129100
20140506,36.4900,37.1700,36.4800,36.9400,19156000
20140507,34.0700,35.9900,33.6700,35.9900,66062700
20140508,33.9200,34.5700,33.6100,33.8800,30407700
20140509,33.7600,34.1000,33.4100,34.0100,20303400
20140512,34.4500,34.6000,33.8700,33.9900,22520600
20140513,34.4000,34.6900,34.1700,34.4300,12477100
20140514,34.1700,34.6500,33.9800,34.4800,17039000
20140515,33.8000,34.1900,33.4000,34.1800,18879800
20140516,33.4100,33.6600,33.1000,33.6600,18847100
20140519,33.8900,33.9900,33.2800,33.4100,14845700
20140520,33.8700,34.4700,33.6700,33.9900,18596700
20140521,34.3600,34.3900,33.8900,34.0000,13804500
20140522,34.7000,34.8600,34.2600,34.6000,17522800
20140523,35.0200,35.0800,34.5100,34.8500,16294400
20140527,35.1200,35.1300,34.7300,35.0000,13057000
20140528,34.7800,35.1700,34.4200,35.1500,16960500
20140529,34.9000,35.1000,34.6700,34.9000,9780800
20140530,34.6500,34.9300,34.1300,34.9200,13153000
20140602,34.8700,34.9500,34.2800,34.6900,9178900
20140603,34.6500,34.9700,34.5800,34.8000,6557500
20140604,34.7300,34.8300,34.2600,34.4800,9434100
I'm assuming that you are receiving the data as a list of lists. So something like -
vals = [[20140502,36.8700,37.1200,36.2100,36.5900,22454100], [20140505,36.9100,37.0500,36.3000,36.6800,13129100], ...]
In that case, you can populate your dataframe with loc -
for index, val in enumerate(vals):
df.loc[index] = val
Which will give you -
In [6]: df
Out[6]:
Date Close High Low Open Volume
0 20140502 36.87 37.12 36.21 36.59 22454100
1 20140505 36.91 37.05 36.3 36.68 13129100
...
Here, enumerate gives us the index of the row, so we can use that to populate the dataframe index.
If somehow the data was saved as csv, then you can simply use read_csv -
df = pd.read_csv('data.csv', names=['Date','Close','High','Low','Open','Volume'])