I want to get left value (LD) pipe separated value from the DataFrame column "'CA Distance Nominal (LD | au)" here is the code.
when I convert the string to float I get all the values as NaN.
cneos = pd.read_csv('cneos.csv')
print(cneos['CA Distance Nominal (LD | au)'].head())
cneos['Distance']=pd.to_numeric(cneos['CA Distance Nominal (LD | au)'], errors='coerce')
print(cneos['Distance'].head())
Result
0 2.02 | 0.00520
1 0.39 | 0.00100
2 8.98 | 0.02307
3 3.88 | 0.00996
4 4.84 | 0.01244
Name: CA Distance Nominal (LD | au), dtype: object
After to_numeric()
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
Name: Distance, dtype: float64
How can I get the both values LD and AU separated in float
I'm not sure that it is the best way to resolve your problem, but it works:
separeted_data_frame = pd.DataFrame(cneos['CA Distance Nominal (LD | au)'].apply(lambda x: x.split('|')).to_list())
separeted_data_frame.columns = ['LD', 'AU']
separeted_data_frame.LD = separeted_data_frame.LD.astype(float)
separeted_data_frame.AU = separeted_data_frame.AU.astype(float)
cneos = cneos.join(separeted_data_frame).drop('CA Distance Nominal (LD | au)', 1)
The result is:
LD AU
0 2.02 0.00520
1 0.39 0.00100
2 8.98 0.02307
3 3.88 0.00996
4 4.84 0.01244
Is it what you wanted?
Related
Does panda can convert the key value to customized table. Here is the sample of the data.
1675484100 customer=A.1 area=1 height=20 width={10,10} length=1
1675484101 customer=B.1 area=10 height=30 width={20,11} length=2
1675484102 customer=C.1 area=11 height=40 width={30,12} length=3 remarks=call
Generate a table with key as a header and the associated value. First field as a time.
I would use a regex to get each key/value pair, then reshape:
data = '''1675484100 customer=A.1 area=1 height=20 width={10,10} length=1
1675484101 customer=B.1 area=10 height=30 width={20,11} length=2
1675484102 customer=C.1 area=11 height=40 width={30,12} length=3 remarks=call'''
df = (pd.Series(data.splitlines()).radd('time=')
.str.extractall(r'([^\s=]+)=([^\s=]+)')
.droplevel('match').set_index(0, append=True)[1]
# unstack keeping order
.pipe(lambda d: d.unstack()[d.index.get_level_values(-1).unique()])
)
print(df)
Output:
0 time customer area height width length remarks
0 1675484100 A.1 1 20 {10,10} 1 NaN
1 1675484101 B.1 10 30 {20,11} 2 NaN
2 1675484102 C.1 11 40 {30,12} 3 call
Assuming that your input is a string defined as data, you can use this :
L = [{k: v for k, v in (x.split("=") for x in l.split()[1:])}
for l in data.split("\n") if l.strip()]
df = pd.DataFrame(L)
df.insert(0, "time", [pd.to_datetime(int(x.split()[0]), unit="s")
for x in data.split("\n")])
Otherwise, if the data are stored in some sort of a (.txt) file, add this at the beginning :
with open("file.txt", "r") as f:
data = f.read()
Output :
print(df)
time customer area height width length remarks
0 2023-02-04 04:15:00 A.1 1 20 {10,10} 1 NaN
1 2023-02-04 04:15:01 B.1 10 30 {20,11} 2 NaN
2 2023-02-04 04:15:02 C.1 11 40 {30,12} 3 call
I have a df which looks like the below, There are 2 quantity columns and I want to move the quantities in the "QTY 2" column to the "QTY" column
Note: there are no instances where there are values in the same row for both columns (So for each row, QTY is either populated or else QTY 2 is populated. Not Both)
DF
Index
Product
QTY
QTY 2
0
Shoes
5
1
Jumpers
10
2
T Shirts
15
3
Shorts
13
Desired Output
Index
Product
QTY
0
Shoes
5
1
Jumpers
10
2
T Shirts
15
3
Shorts
13
Thanks
Try this:
import numpy as np
df['QTY'] = np.where(df['QTY'].isnull(), df['QTY 2'], df['QTY'])
df["QTY"] = df["QTY"].fillna(df["QTY 2"], downcast="infer")
filling the gaps of QTY with QTY 2:
In [254]: df
Out[254]:
Index Product QTY QTY 2
0 0 Shoes 5.0 NaN
1 1 Jumpers NaN 10.0
2 2 T Shirts NaN 15.0
3 3 Shorts 13.0 NaN
In [255]: df["QTY"] = df["QTY"].fillna(df["QTY 2"], downcast="infer")
In [256]: df
Out[256]:
Index Product QTY QTY 2
0 0 Shoes 5 NaN
1 1 Jumpers 10 10.0
2 2 T Shirts 15 15.0
3 3 Shorts 13 NaN
downcast="infer" makes it "these look like integer after NaNs gone, so make the type integer".
you can drop QTY 2 after this with df = df.drop(columns="QTY 2"). If you want one-line is as usual possible:
df = (df.assign(QTY=df["QTY"].fillna(df["QTY 2"], downcast="infer"))
.drop(columns="QTY 2"))
You can do ( I am assuming your empty values are empty strings):
df = df.assign(QTY= df[['QTY', 'QTY2']].
replace('', 0).
sum(axis=1)).drop('QTY2', axis=1)
print(df):
Product QTY
0 Shoes 5
1 Jumpers 10
2 T Shirts 15
3 Shorts 13
If the empty values are actually NaNs then
df['QTY'] = df['QTY'].fillna(df['QTY2']) #or
df['QTY'] = df[['QTY', 'QTY2']].sum(1)
I have the following DF
ID
0 1.0
1 555555.0
2 NaN
3 200.0
When I try to convert the ID column to Int64 I got the following error:
Cannot convert non-finite values (NA or inf) to integer
I've used the following code to solve this problem:
df["ID"] = df["ID"].astype('int64', errors='ignore')
Although, when I use the above code my ID column persists with float64 type.
Any tip to solve this problem?
Use pd.Int64DType64 instead of np.int64:
df['ID'] = df['ID'].fillna(pd.NA).astype(pd.Int64Dtype())
Output:
>>> df
ID
0 1
1 555555
2 <NA>
3 200
>>> df['ID'].dtype
Int64Dtype()
>>> df['ID'] + 10
0 11
1 555565
2 <NA>
3 210
Name: ID, dtype: Int64
>>> print(df.to_csv(index=False))
ID
1
555555
""
200
I have a table which looks like this:
df_raw = pd.DataFrame(dict(A = pd.Series(['1.00','-1']), B = pd.Series(['1.0','-45.00','-'])))
A B
0 1.00 1.0
1 -1 -45.00
2 NaN -
I would like to replace '-' to '0.00' using dataframe.replace() but it struggles because of the negative values, '-1', '-45.00'.
How can I ignore the negative values and replace only '-' to '0.00' ?
my code:
df_raw = df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True).astype(np.float64)
error code:
ValueError: invalid literal for float(): 0.0045.00
Your regex is matching on all - characters:
In [48]:
df_raw.replace(['-','\*'], ['0.00','0.00'], regex=True)
Out[48]:
A B
0 1.00 1.0
1 0.001 0.0045.00
2 NaN 0.00
If you put additional boundaries so that it only matches that single character with a termination then it works as expected:
In [47]:
df_raw.replace(['^-$'], ['0.00'], regex=True)
Out[47]:
A B
0 1.00 1.0
1 -1 -45.00
2 NaN 0.00
Here ^ means start of string and $ means end of string so it will only match on that single character.
Or you can just use replace which will only match on exact matches:
In [29]:
df_raw.replace('-',0)
Out[29]:
A B
0 1.00 1.0
1 -1 -45.00
2 NaN 0
I would like to convert the following dataframe into a json .
df:
A sector B sector C sector
TTM Ratio -- 35.99 12.70 20.63 14.75 23.06
RRM Sales -- 114.57 1.51 5.02 1.00 4594.13
MQR book 1.48 2.64 1.02 2.46 2.73 2.74
TTR cash -- 14.33 7.41 15.35 8.59 513854.86
In order to do so by using the function df.to_json() I would need to have unique names in column and indices.
Therefore what I am looking for is to convert the column names into a row and have default column numbers . In short I would like the following output:
df:
0 1 2 3 4 5
A sector B sector C sector
TTM Ratio -- 35.99 12.70 20.63 14.75 23.06
RRM Sales -- 114.57 1.51 5.02 1.00 4594.13
MQR book 1.48 2.64 1.02 2.46 2.73 2.74
TTR cash -- 14.33 7.41 15.35 8.59 513854.86
Turning the column names into the first row so I can make the conversion correctly .
You could also use vstack in numpy:
>>> df
x y z
0 8 7 6
1 6 5 4
>>> pd.DataFrame(np.vstack([df.columns, df]))
0 1 2
0 x y z
1 8 7 6
2 6 5 4
The columns become the actual first row in this case.
Use assign by list of range and original column names:
print (range(len(df.columns)))
range(0, 6)
#for python2 list can be omit
df.columns = [list(range(len(df.columns))), df.columns]
Or MultiIndex.from_arrays:
df.columns = pd.MultiIndex.from_arrays([range(len(df.columns)), df.columns])
Also is possible use RangeIndex:
print (pd.RangeIndex(len(df.columns)))
RangeIndex(start=0, stop=6, step=1)
df.columns = pd.MultiIndex.from_arrays([pd.RangeIndex(len(df.columns)), df.columns])
print (df)
0 1 2 3 4 5
A sector B sector C sector
TTM Ratio -- 35.99 12.70 20.63 14.75 23.06
RRM Sales -- 114.57 1.51 5.02 1.00 4594.13
MQR book 1.48 2.64 1.02 2.46 2.73 2.74
TTR cash -- 14.33 7.41 15.35 8.59 513854.86