Convert multiple downloaded time series share to pandas dataframe - pandas

i downloaded the information about multiple shares using nsepy library for the last 10 days, but could not save it in the pandas dataframe.
Below code to download the multiples share data:
import datetime
from datetime import date
from nsepy import get_history
import pandas as pd
symbol=['SBIN','GAIL','NATIONALUM' ]
data={}
for s in symbol:
data[s]=get_history(s,start=date(2022, 11, 29),end=date(2022, 12, 12))
Below code using to convert the data to pd datafarme, but i am getting error
new = pd.DataFrame(data, index=[0])
new
error message:
ValueError: Shape of passed values is (14, 3), indices imply (1, 3)

Documentation of get_history sais:
Returns:
pandas.DataFrame : A pandas dataframe object
Thus, data is a dict with the symbol as keys and the pd.DataFrames as values. Then you are trying to insert a DataFrame inside of another DataFrame, that does not work. If you want to create a new MultiIndex Dataframe from the 3 existing DataFrames, you can do something like this:
result = {}
for df, symbol in zip(data.values(), data.keys()):
data = df.to_dict()
for key, value in data.items():
result[(symbol, key)] = value
df_multi = pd.DataFrame(result)
df_multi.columns
Result (just showing two columns per Symbol to clarifying the Multiindex structure)
MultiIndex([( 'SBIN', 'Symbol'),
( 'SBIN', 'Series'),
( 'GAIL', 'Symbol'),
( 'GAIL', 'Series'),
('NATIONALUM', 'Symbol'),
('NATIONALUM', 'Series')
Edit
So if you just want a single index DF, like in your attached file with the symbols in a column, you can simply to this:
new_df = pd.DataFrame()
for symbol in data:
# sequentally concat the DataFrames from your dict of DataFrames
new_df = pd.concat([data[symbol], new_df],axis=0)
new_df
Then the output looks like in your file.

Related

Convert a column to a list of prevoius columns in a Dataframe

I would like to create a column that is the form of a list of values from two previous columns, such as location that is made up of the long and lat columns.
This is what the DataFrame looks like
You can create a new columne based on other columns using zip, as follows:
import pandas as pd
df = pd.DataFrame({
'admin_port': ['NORTH SHIELDS', 'OBAN'],
'longitude': [-1.447104, -5.473469],
'latitude': [55.008766, 54.415695],
})
df['new'] = pd.Series(list(zip(df['longitude'].values, df['latitude'].values)))
print(df)
>>> df
admin_port longitude latitude new
0 NORTH SHIELDS -1.447104 55.008766 (-1.447104, 55.008766)
1 OBAN -5.473469 54.415695 (-5.473469, 54.415695)
For your information, you can see how to use zip() here: https://www.w3schools.com/python/ref_func_zip.asp

Streamlit - Applying value_counts / groupby to column selected on run time

I am trying to apply value_counts method to a Dataframe based on the columns selected dynamically in the Streamlit app
This is what I am trying to do:
if st.checkbox("Select Columns To Show"):
all_columns = df.columns.tolist()
selected_columns = st.multiselect("Select", all_columns)
new_df = df[selected_columns]
st.dataframe(new_df)
The above lets me select columns and displays data for the selected columns. I am trying to see how could I apply value_counts/groupby method on this output in Streamlit app
If I try to do the below
st.table(new_df.value_counts())
I get the below error
AttributeError: 'DataFrame' object has no attribute 'value_counts'
I believe the issue lies in passing a list of columns to a dataframe. When you pass a single column in [] to a dataframe, you get back a pandas.Series object (which has the value_counts method). But when you pass a list of columns, you get back a pandas.DataFrame (which doesn't have value_counts method defined on it).
Can you try st.table(new_df[col_name].value_counts())
I think the error is because value_counts() is applicable on a Series and not dataframe.
You can try Converting ".value_counts" output to dataframe
If you want to apply on one single column
def value_counts_df(df, col):
"""
Returns pd.value_counts() as a DataFrame
Parameters
----------
df : Pandas Dataframe
Dataframe on which to run value_counts(), must have column `col`.
col : str
Name of column in `df` for which to generate counts
Returns
-------
Pandas Dataframe
Returned dataframe will have a single column named "count" which contains the count_values()
for each unique value of df[col]. The index name of this dataframe is `col`.
Example
-------
>>> value_counts_df(pd.DataFrame({'a':[1, 1, 2, 2, 2]}), 'a')
count
a
2 3
1 2
"""
df = pd.DataFrame(df[col].value_counts())
df.index.name = col
df.columns = ['count']
return df
val_count_single = value_counts_df(new_df, selected_col)
If you want to apply for all object columns in the dataframe
def valueCountDF(df, object_cols):
c = df[object_cols].apply(lambda x: x.value_counts(dropna=False)).T.stack().astype(int)
p = (df[object_cols].apply(lambda x: x.value_counts(normalize=True,
dropna=False)).T.stack() * 100).round(2)
cp = pd.concat([c,p], axis=1, keys=["Count", "Percentage %"])
return cp
val_count_df_cols = valueCountDF(df, selected_columns)
And Finally, you can use st.table or st.dataframe to show the dataframe in your streamlit app

how to extract the unique values and its count of a column and store in data frame with index key

I am new to pandas.I have a simple question:
how to extract the unique values and its count of a column and store in data frame with index key
I have tried to:
df = df1['Genre'].value_counts()
and I am getting a series but I don't know how to convert it to data frame object.
Pandas series has a .to_frame() function. Try it:
df = df1['Genre'].value_counts().to_frame()
And if you wanna "switch" the rows to columns:
df = df1['Genre'].value_counts().to_frame().T
Update: Full example if you want them as columns:
import pandas as pd
import numpy as np
np.random.seed(400) # To reproduce random variables
df1 = pd.DataFrame({
'Genre': np.random.choice(['Comedy','Drama','Thriller'], size=10)
})
df = df1['Genre'].value_counts().to_frame().T
print(df)
Returns:
Thriller Comedy Drama
Genre 5 3 2
try
df = pd.DataFrame(df1['Genre'].value_counts())

How to create a DataFrame with index names different from `row` and write data into (`index`, `column`) pairs in Julia?

How can I create a DataFrame with Julia with index names that are different from Row and write values into a (index,column) pair?
I do the following in Python with pandas:
import pandas as pd
df = pd.DataFrame(index = ['Maria', 'John'], columns = ['consumption','age'])
df.loc['Maria']['age'] = 52
I would like to do the same in Julia. How can I do this? The documentation shows a DataFrame similar to the one I would like to construct but I cannot figure out how.

when reading an html (pandas.read_html), how to select dataframe and set_ index in one line

I'm reading an html which brings back a list of dataframes. I want to be able to choose the dataframe from the list and set my index (index_col) in the least amount of lines.
Here is what I have right now:
import pandas as pd
df =pd.read_html('http://finviz.com/insidertrading.ashx?or=-10&tv=100000&tc=1&o=-transactionvalue', header = 0)
df2 =df[4] #here I'm assigning df2 to dataframe#4 from the list of dataframes I read
df2.set_index('Date', inplace =True)
Is it possible to do all this in one line? Do I need to create another dataframe (df2) to assign one dataframe from a list, or is it possible I can assign the dataframe as soon as I read the list of dataframes (df).
Thanks.
Anyway:
import pandas as pd
df = pd.read_html('http://finviz.com/insidertrading.ashx?or=-10&tv=100000&tc=1&o=-transactionvalue', header = 0)[4].set_index('Date')