pandas loc does not preserve data type - pandas

Pandas loc indexing does not preserve the datatype of subarrays. Consider the following code:
import pandas as pd
s = pd.Series([1,2,"hi","bye"])
print(s) # dtype: object
print(s.loc[[0]]) # dtype: object
print(type(s.loc[0])) # <class 'int'>
I would like s.loc[[0]] to return a series with type int, rather than obj as it currently does.

You can use the astype(original data type).astype(your prefered data type) in the print clause, e.g. from your case:
import pandas as pd
s = pd.Series([1,2,"hi","bye"])
print(s)
print(s.loc[[0]].astype(str).astype(int))
result :
0 1
dtype: int32
Here is my answer, hope it be useful

Related

How to change the a specific value of a row of a pandas Series?

I have the following pandas Series:
trade dtype
trade_action category
execution_venue object
from_implied int64
In the last row I would like to change the from_implied name to implied. How can I do this?
Expected output:
trade dtype
trade_action category
execution_venue object
implied int64
Here's what you can do:
ser = pd.Series(["trade","trade_Action","execution_venue","from_implied"])
ser2 = ser.replace(to_replace = "from_implied", value = "implied")
With ser.replace you can change the values of a pd.Series as above.
Assuming that the first column is your Series' index, you can use the pd.Series.rename method on your series:
import pandas as pd
# Your series here
series = pd.read_clipboard().set_index("trade")["dtype"]
out = series.rename({"from_implied": "implied"})
out:
trade
trade_action category
execution_venue object
implied int64
Name: dtype, dtype: object

Pandas series pad function not working with apply in pandas

I am trying to write the code to pad columns of my pandas dataframe with different characters. I tried to use apply function to fill '0' with zfill and it works.
print(df["Date"].apply(lambda x: x.zfill(10)))
But when I try to use pad function using apply method to my dataframe I face error:
AttributeError: 'str' object has no attribute 'pad'
The code I am trying is:
print(df["Date"].apply(lambda x: x.pad(10, side="left", fillchar="0")))
Both the zfill and pad functions are a part of pandas.Series.str. I am confused why pad is not working and zfill works. How can I achieve this functionality?
Full code:
import pandas as pd
from io import StringIO
StringData = StringIO(
"""Date,Time
パンダ,パンダ
パンダサンDA12-3,パンダーサンDA12-3
パンダサンDA12-3,パンダサンDA12-3
"""
)
df = pd.read_csv(StringData, sep=",")
print(df["Date"].apply(lambda x: x.zfill(10))) -- works
print(df["Date"].apply(lambda x: x.pad(10, side="left", fillchar="0"))) -- doesn't work
I am using pandas 1.5.1.
You should just not use apply, doing so you don't benefit from Series methods, but rather use pure python str methods:
print(df["Date"].str.zfill(10))
print(df["Date"].str.pad(10, side="left", fillchar="0"))
output:
0 0000000パンダ
1 パンダサンDA12-3
2 パンダサンDA12-3
Name: Date, dtype: object
0 0000000パンダ
1 パンダサンDA12-3
2 パンダサンDA12-3
Name: Date, dtype: object
multiple columns:
Now, you need to use apply, but this is DataFrame.apply, not Series.apply:
df[['col1', 'col2', 'col3']].apply(lambda s: s.str.pad(10, side="left", fillchar="0"))

dtype definition for pandas dataframe with columns of VARCHAR or String

I want to get some data in a dictionary that need to go into a pandas dataframe.
The dataframe is later written in a PostgreSQL table using sqlalchemy, and I would like to get the right column types.
Hence, I specify the dtypes for the dataframe
dtypes = {"forretningshændelse": sqlalchemy.types.String(length=8),
"forretningsområde": sqlalchemy.types.String(length=40),
"forretningsproces": sqlalchemy.types.INTEGER(),
"id_namespace": sqlalchemy.types.String(length=100),
"id_lokalId": sqlalchemy.types.String(length=36),
"kommunekode": sqlalchemy.types.INTEGER(),
"registreringFra": sqlalchemy.types.DateTime()}
Later I use df = pd.DataFrame(item_lst, dtype=dtypes), where item_lst is a list of dictionaries.
Independent from me using either String(8), String(length=8) or VARCHAR(8) in the dtype definition, the result of pd.DataFrame(item_lst, dtype=dtypes) is always object of type '(String or VARCHAR)' has no len().
How do I have to define the dtype to overcome this error?
Instead of forcing data types when the DataFrame is created, let pandas infer the data types (just df = pd.DataFrame(item_lst)) and then use your dtypes dict with to_sql() when you push your DataFrame to the database, like this:
from pprint import pprint
import pandas as pd
import sqlalchemy
engine = sqlalchemy.create_engine("sqlite://")
item_lst = [{"forretningshændelse": "foo"}]
df = pd.DataFrame(item_lst)
print(df.info())
"""
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 forretningshændelse 1 non-null object
dtypes: object(1)
memory usage: 136.0+ bytes
None
"""
dtypes = {"forretningshændelse": sqlalchemy.types.String(length=8)}
df.to_sql("tbl", engine, index=False, dtype=dtypes)
insp = sqlalchemy.inspect(engine)
pprint(insp.get_columns("tbl"))
"""
[{'autoincrement': 'auto',
'default': None,
'name': 'forretningshændelse',
'nullable': True,
'primary_key': 0,
'type': VARCHAR(length=8)}]
"""
I believe you are confusing the dtypes within the DataFrame with the dtypes on the SQL table itself.
You probably don't need to manually specify the datatypes in pandas itself but if you do, here's how.
Spoiler alert: it is written in the pandas.Dataframe documentation that only a single dtype must be specified so you will need some loops or manual column work to get different types.
To solve your problem:
import pandas as pd
import sqlalchemy
engine = sqlalchemy.create_engine("connection_string")
df = pd.DataFrame(item_list)
dtypes = {"forretningshændelse": sqlalchemy.types.String(length=8),
"forretningsområde": sqlalchemy.types.String(40),
"forretningsproces": sqlalchemy.types.INTEGER(),
"id_namespace": sqlalchemy.types.String(100),
"id_lokalId": sqlalchemy.types.String(36),
"kommunekode": sqlalchemy.types.INTEGER(),
"registreringFra": sqlalchemy.types.DateTime()}
with engine.connect() as engine:
df.to_sql("table_name",if_exists="replace", con=engine, dtype=dtypes)
Tip: Avoid using special characters while coding in general, it only makes maintaining code harder at some point :). I assumed you're creating a new sql table and not appending, otherwise types for the table would already be defined.
Happy Coding!

setting index value to np.uint64 in a panda dataframe changes its type to int64

I am trying to set the dtype on the index of a panda dataframe
import pandas as pd
import numpy as np
a = pd.DataFrame(columns=['value'])
a.loc[np.uint64(3)] = [3]
# Here I would expect to get UINT64 but it's INT64
a.index.dtype
> dtype('int64')
a.index = a.index.astype(np.uint64)
# Works as expected
a.index.dtype
> dtype('uint64')
a.loc[np.uint64(4)] = [4]
# Now I am getting totally confused. Why do I get float64 ?
a.index.dtype
> dtype('float64')
Why is Panda not using the DType of the index value passed to the lob function
Why is the dtype changing of what it seems to be randomly ?
What am I missing ? What is the proper way to set the index type once and for all ?

How would I access individual elements of this pandas dataFrame?

How would I access the individual elements of the dataFrame below?
More specifically, how would I retrieve/extract the string "CN112396173" in the index 2 of the dataFrame?
Thanks
A more accurate description of your problem would be: "Getting all first words of string column in pandas"
You can use data["PN"].str.split(expand=True)[0]. See the docs.
>>> import pandas as pd
>>> df = pd.DataFrame({"column": ["asdf abc cdf"]})
>>> series = df["column"].str.split(expand=True)[0]
>>> series
0 asdf
Name: 0, dtype: object
>>> series.to_list()
["asdf"]
dtype: object is actually normal (in pandas, strings are 'objects').