Why Imputation converts float64 column to object and not converting again? - pandas

I just did mean imputation on a pandas dataframe, but the column after imputation in converted to object type and is not converting back to float. I tried astype(float), which says need a string not a method, to_numeric() also donot work.
this is it is converted to object
Applying astype() function
This is the error that comes
# mean imputation
mean_age = df['Age'].mean
df['Mean_Age'] = df['Age'].fillna(mean_age)
# This makes the column to object and astype() and to_numeric() even donot work to convert it back to float

Related

Related Numpy array typcasting

I have the below python code:
a = np.array([1, 2, '3'])
print(a)
output:
['1' '2' '3']
My question is, why all elements are converted into strings?
I know that in numpy array, if the array consist of different elements it will be typecasted. But on what basis it will be typecasted?
This is fairly well explained in the numpy.array documentation (highlighting is mine):
numpy.array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0, like=None)
[…]
dtype: data-type, optional
The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence.
An integer can always be converted to string, the other way around it not always possible (e.g., a cannot be converted to string).
This is the same if you mix floats and integers, the array will be casted as float.

Can't convert Matrix to DataFrame JULIA

How can i convert a matrix to DataFrame in Julia?
I have an 10×2 Matrix{Any}, and when i try to convert it to a dataframe, using this:
df2 = convert(DataFrame,Xt2)
i get this error:
MethodError: Cannot `convert` an object of type Matrix{Any} to an object of type DataFrame
Try instead
df2 = DataFrame(Xt2,:auto)
You cannot use convert for this; you can use the DataFrame constructor, but then as the documentation (simply type ? DataFrame in the Julia REPL) will tell you, you need to either provide a vector of column names, or :auto to auto-generate column names.
Tangentially, I would also strongly recommend avoiding Matrix{Any} (or really anything involving Any) for any scenario where performance is at all important.

Writing data frame with object dtype to HDF5 only works after converting to string

I have a big data dataframe and I want to write it to disk for quick retrieval. I believe to_hdf(...) infers the data type of the columns and sometimes gets it wrong. I wonder what the correct way is to cope with this.
import pandas as pd
import numpy as np
length = 10
df = pd.DataFrame({"a": np.random.randint(1e7, 1e8, length),})
# df.loc[1, "a"] = "abc"
# df["a"] = df["a"].astype(str)
print(df.dtypes)
df.to_hdf("df.hdf5", key="data", format="table")
Uncommenting various lines leads me to the following.
Just filling the column with numbers will lead to a data type int32 and stores without problem
Setting one element to abc changes the data to object, but it seems that to_hdf internally infers another data type and throws an error: TypeError: object of type 'int' has no len()
Explicitely converting the column to str leads to success, and to_hdf stores the data.
Now I am wondering what is happening in the second case, and is there a way to prevent this? The only way I found was to go through all columns, check if they are dtype('O') and explicitely convert them to str.
Instead of using hdf5, I have found a generic pickling library which seems to be perfect for the job: jiblib
Storing and loading data is straight forward:
import joblib
joblib.dump(df, "file.jl")
df2 = joblib.load("file.jl")

which float precision are numpy arrays by default?

I wonder which format floats are in numpy array by default.
(or do they even get converted when declaring a np.array? if so how about python lists?)
e.g. float16,float32 or float64?
float64. You can check it like
>>> np.array([1, 2]).dtype
dtype('int64')
>>> np.array([1., 2]).dtype
dtype('float64')
If you dont specify the data type when you create the array then numpy will infer the type, from the docs
dtypedata-type, optional - The desired data-type for the array. If not given, then the type will be determined as the minimum type
required to hold the objects in the sequence

How to convert all values in a pandas DataFrame to string according to display settings?

How can I turn a DataFrame into DataFrame of strings according to the same rules that str(df) uses?
I tried df.astype("str") and df.applymap(str), but both left floats with larger precision than indicated by display.precision setting.
Use .round() before converting to str:
p = pd.get_option('display.precision')
df.round(p).astype(str)
Pandas rounds numerical data when you try to display it, to the precision specified by display.precision; the data is still stored by its full precision.
Directly casting to str results in pandas using the full precision of the float; it is independent of whatever setting you have for display.precision.
You can use applymap with a format string, e.g.:
df.applymap(lambda x: '{0:.2f}'.format(x))