Python: Convert entire column to dictionary - pandas

I am just getting started with pandas recently.
I have a dataframe that looks like this
import pandas as pd
locations=pd.read_csv('locations.csv')
lat lon
0 30.29 -87.44
1 30.21 -87.44
2 31.25 -87.41
I want to convert it to something like this
{'lat': [37.974508, 38.050247, 37.985352],
'lon': [-87.582584, -87.540012, -87.50776]}

Check to_dict
df.to_dict('l')
Out[951]: {'Lon': [-87.44, -87.44, -87.41], 'lat': [30.29, 30.21, 31.25]}

Keys are column names, values are lists of column data
locations.to_dict('list')

Try this:
lat_lon = {'lat': list(locations['lat']), 'lon': list(locations['lon'])}

Related

Convert multiple downloaded time series share to pandas dataframe

i downloaded the information about multiple shares using nsepy library for the last 10 days, but could not save it in the pandas dataframe.
Below code to download the multiples share data:
import datetime
from datetime import date
from nsepy import get_history
import pandas as pd
symbol=['SBIN','GAIL','NATIONALUM' ]
data={}
for s in symbol:
data[s]=get_history(s,start=date(2022, 11, 29),end=date(2022, 12, 12))
Below code using to convert the data to pd datafarme, but i am getting error
new = pd.DataFrame(data, index=[0])
new
error message:
ValueError: Shape of passed values is (14, 3), indices imply (1, 3)
Documentation of get_history sais:
Returns:
pandas.DataFrame : A pandas dataframe object
Thus, data is a dict with the symbol as keys and the pd.DataFrames as values. Then you are trying to insert a DataFrame inside of another DataFrame, that does not work. If you want to create a new MultiIndex Dataframe from the 3 existing DataFrames, you can do something like this:
result = {}
for df, symbol in zip(data.values(), data.keys()):
data = df.to_dict()
for key, value in data.items():
result[(symbol, key)] = value
df_multi = pd.DataFrame(result)
df_multi.columns
Result (just showing two columns per Symbol to clarifying the Multiindex structure)
MultiIndex([( 'SBIN', 'Symbol'),
( 'SBIN', 'Series'),
( 'GAIL', 'Symbol'),
( 'GAIL', 'Series'),
('NATIONALUM', 'Symbol'),
('NATIONALUM', 'Series')
Edit
So if you just want a single index DF, like in your attached file with the symbols in a column, you can simply to this:
new_df = pd.DataFrame()
for symbol in data:
# sequentally concat the DataFrames from your dict of DataFrames
new_df = pd.concat([data[symbol], new_df],axis=0)
new_df
Then the output looks like in your file.

Cleaning DataFrame columns words starting with defined character

I have a Pandas DataFrame that I would like to clean a little bit.
import pandas as pd
data = ['This is awesome', '\$BTC $USD Short the market', 'Dont miss the dip on $ETH']
df = pd.DataFrame(data)
print(df)
'''
I am trying to delete all words starting with "$" such as "$BTC", "$USD", etc. Can't figure out what to do. Convert the column to a list? Would like to use the function startswith() but don't know exactly how... thanks for your help!
Code-
import pandas as pd
data = ['This is awesome', '\$BTC $USD Short the market', 'Dont miss the dip on $ETH']
df = pd.DataFrame(data,columns=['data'])
df['data']=df['data'].replace('\$\w+',"", regex=True)
df
Output-
data
0 This is awesome
1 \ Short the market
2 Dont miss the dip on
Ref link- remove words starting with "#" in a column from a dataframe

Convert a column to a list of prevoius columns in a Dataframe

I would like to create a column that is the form of a list of values from two previous columns, such as location that is made up of the long and lat columns.
This is what the DataFrame looks like
You can create a new columne based on other columns using zip, as follows:
import pandas as pd
df = pd.DataFrame({
'admin_port': ['NORTH SHIELDS', 'OBAN'],
'longitude': [-1.447104, -5.473469],
'latitude': [55.008766, 54.415695],
})
df['new'] = pd.Series(list(zip(df['longitude'].values, df['latitude'].values)))
print(df)
>>> df
admin_port longitude latitude new
0 NORTH SHIELDS -1.447104 55.008766 (-1.447104, 55.008766)
1 OBAN -5.473469 54.415695 (-5.473469, 54.415695)
For your information, you can see how to use zip() here: https://www.w3schools.com/python/ref_func_zip.asp

how to extract the unique values and its count of a column and store in data frame with index key

I am new to pandas.I have a simple question:
how to extract the unique values and its count of a column and store in data frame with index key
I have tried to:
df = df1['Genre'].value_counts()
and I am getting a series but I don't know how to convert it to data frame object.
Pandas series has a .to_frame() function. Try it:
df = df1['Genre'].value_counts().to_frame()
And if you wanna "switch" the rows to columns:
df = df1['Genre'].value_counts().to_frame().T
Update: Full example if you want them as columns:
import pandas as pd
import numpy as np
np.random.seed(400) # To reproduce random variables
df1 = pd.DataFrame({
'Genre': np.random.choice(['Comedy','Drama','Thriller'], size=10)
})
df = df1['Genre'].value_counts().to_frame().T
print(df)
Returns:
Thriller Comedy Drama
Genre 5 3 2
try
df = pd.DataFrame(df1['Genre'].value_counts())

Convert Pandas DataFrame into Series with multiIndex

Let us consider a pandas DataFrame (df) like the one shown above.
How do I convert it to a pandas Series?
Just select the single column of your frame
df['Count']
result = pd.Series(df['Count'])