How to generate a pandas dataframe from ordereddict? - pandas

How can i generated a pandas dataframe from an ordereddict?
I have tried using the dataframe.from_dict method but that is not giving me the expected dataframe.
What is the best approach to convert an ordereddict into a list of dicts?

A bug in Pandas did not respect the key ordering of OrderedDict objects converted to a DataFrame via the from_dict call. Fixed in Pandas 0.11.

Related

Can not infer schema for type when converting pandas dataframe to pyspark dataframe

I am trying to use pyspark.pandas to read excel and I need to convert the pandas dataframe to pyspark dataframe.
df = panndas .read_excel(filepath,sheet_name="A", skiprows=12 ,usecols="B:AM",parse_dates=True)
pyspark_df= spark.createDataFrame(df)
when I do this, I got error
TypeError: Can not infer schema for type:
Even though I tried to specify the dtype for the read_excel and define the schema. I still have the error.
df = panndas .read_excel(filepath,sheet_name="A", skiprows=12 ,usecols="B:AM",parse_dates=True,dtype= dtypetest)
pyspark_df= spark.createDataFrame(df,schema)
Would you tell me how to solve it?

create a dask dataframe from a dictionary

I have a dictionary like this:
d = {'Caps': 'cap_list', 'Term': 'unique_tokens', 'LocalFreq': 'local_freq_list','CorpusFreq': 'corpus_freq_list'}
I want to create a dask dataframe from it. How do I do it? Normally, in Pandas, is can be easily imported to a Pandas df by:
df = pd.DataFrame({'Caps': cap_list, 'Term': unique_tokens, 'LocalFreq': local_freq_list,
'CorpusFreq': corpus_freq_list})
Should I first load into a bag and then convert from bag to ddf?
If your data fits in memory then I encourage you to use Pandas instead of Dask Dataframe.
If for some reason you still want to use Dask dataframe then I would convert things to a Pandas dataframe and then use the dask.dataframe.from_pandas function.
import dask.dataframe as dd
import pandas as pd
df = pd.DataFrame(...)
ddf = dd.from_pandas(df, npartitions=20)
But there are many cases where this will be slower than just using Pandas well.

Pandas- Groupby Plot is not working for object

I am new to Pandas and doing some analysis csv file. I have successfully read csv and shown all details. I have got two column as an object type which I need to plot. I have done groupy for those two columns and getting first and all data, However I am not sure, how to do plotting for these object types in Pandas. Below is my sample of Groupby and smaple for event_type and event_description for which I need to do plotting. If I can plot for Application and Network for event_type that will be great help
import pandas as pd
data = pd.read_csv('/Users/temp/Downloads/sample.csv’)
data.head()
grouped_df = data.groupby([ "event_type", "event_description"])
grouped_df.first()
As commented - need more info, but IIUC, try:
df['event_type'].value_counts(sort=True).plot(kind='barh')

Linear 1D interpolation using a interp1D function in Python on panda dataframe columns

I'm trying to use the "interp1d " function from scipy.interpolate to generate an interpolation from two columns in a python dataframe . I'm using python 2.7. I'm able to generate the interpolation without errors but the interpolation fails to show any reasonable output when the values are supplied within the boundary conditions. For eg column 'X-Co ordinate' in the 16 columns x 200 rows dataframe DF is between 0.5- 10.5 while the 'Y-Co ordinate' column is a number between range 1.5-99.4. I have generated the interpolation as follows:
from scipy.interpolate import interp1d
import pandas as pd
DF=pd.DataFrame() #This dummy dataframe will have the columns and rows as described above
InterpolatedFunction=interp1d(DF['X-Co ordinate'],DF['Y-Co ordinate'], bound_error=False)
InterpolatedValue_For_X_Equals_5=interp1d(5)
Give the pandas built-in method df.interpolate(method='linear') a try.

create dask DataFrame from a list of dask Series

I need to create a a dask DataFrame from a set of dask Series,
analogously to constructing a pandas DataFrame from lists
pd.DataFrame({'l1': list1, 'l2': list2})
I am not seeing anything in the API. The dask DataFrame constructor is not supposed to be called by users directly and takes a computation graph as it's mainargument.
In general I agree that it would be nice for the dd.DataFrame constructor to behave like the pd.DataFrame constructor.
If your series have well defined divisions then you might try dask.dataframe.concat with axis=1.
You could also try converting one of the series into a DataFrame and then use assignment syntax:
L = # list of series
df = L[0].to_frame()
for s in L[1:]:
df[s.name] = s