filling nan values with two different values - pandas

Some values in theenter code here FlightNumber column are missing. These numbers are meant to increase by 10 with each row so 10055 and 10075 need to be put in place. Fill in these missing numbers and make the column an integer column (instead of a float column). Some values in the FlightNumber column are missing. These numbers are meant to increase by 10 with each row so 10055 and 10075 need to be put in place. Fill in these missing numbers and make the column an integer column (instead of a float column).
I tried this but not getting the correct result.
df['FlightNumber'].fillna(10055, inplace = True)
df['FlightNumber'].fillna(10075, inplace = True)
df[['FlightNumber']] = df[['FlightNumber']].astype(int)
df

df.loc[df['FlightNumber'].isna(),'FlightNumber'] = df.loc[df[df['FlightNumber'].isna()].index-1,'FlightNumber'].astype(int)+10

Related

Pandas Replace_ column values

Hello,
I am analyzing the next dataset with this information .
The column ['program_number'] is an object but I want to change it to a integer colum.
I have tried to replace some values but it doesn´t work.
as you can see, some values like 6 is duplicate. like '6 ' and 6.
How can I resolve it? Many thanks
UPDATE
Didn't see 1X and 3X at first.
If you need those numbers and just want to remove the X then:
df["Program"] = df["Program"].str.strip(" X").astype(int)
If there is data in the column which aren't numbers or which shouldn't be converted, you can use pd.to_numeric with errors='corece'. If there are cells which can't be converted, you'll get NaN. Be aware that this will result in floating numbers.
df["Program"] = pd.to_numeric(df["Program"], errors="coerce")
old
You want to use str.strip() here, rather than replace.
Try this:
df1['program_number'] = df1['program_number'].str.strip().astype(int)

How to calculate the difference between row values based on another column value without filtering the values in between

How to calculate the difference between row values based on another column value without filtering the values in between.I want to calculate the difference between seconds for turn_marker == 1. but when I use the following method, it filters all the zeros but I need the zeros, because I need the entire data set.
Here you can see my data set with a column called turn_marker that has the values zero and 1, and another column with seconds. Now I want to calculte the time bwetween those rows where turn_marker is equal 1.
dataframe = main_dataframe.query("turn_marker=='1;'")
main_dataframe["seconds_diff"] = dataframe["seconds"].diff()
main_dataframe
I would be grateful if you could help me.
You can do this:
main_dataframe['indx'] = main_dataframe.index
main_dataframe['diff'] = main_dataframe.sort_values(by=['turn_marker', 'indx'], ascending=[False, True])['seconds'].diff()
main_dataframe.loc[main_dataframe.turn_marker == '0;', 'diff'] = np.nan

return rows dosn't have specefic number of length in pandas

am clean my dataset and cleaned it but am stuck in some rows don't have the specific length must have in the column
The column (order_id) must have 16 character the column type is object, so i'dont know how i can extract all rows don't have the exact character must be in column and how to remove those rows
Thank You .
for more information
image of column
in excel i can just filter the column and show only value that has 16 character
i want to do that in pandas i want just to return rows that contain 16 character and drop all row greater or lower than 16 character .
I suppose you want to keep all rows which match this pattern [0-9A-F]{16}:
df = df[df['order_id'].str.contains(r'^[0-9A-F]{16}$')]

Need explanation on how pandas.drop is working here

I have a data frame, lets say xyz. I have written code to find out the % of null values each column possess in the dataframe. my code below:
round(100*(xyz.isnull().sum()/len(xyz.index)), 2)
let say i got following results:
abc 26.63
def 36.58
ghi 78.46
I want to drop column ghi because it has more than 70% of null values.
I achieved it using the following code:
xyz = xyz.drop(xyz.loc[:,round(100*(xyz.isnull().sum()/len(xyz.index)), 2)>70].columns, 1)
but , i did not understand how does this code works, can anyone please explain it?
the code is doing the following:
xyz.drop( [...], 1)
removes the specified elements for a given axis, either by row or by column. In this particular case, df.drop( ..., 1) means you're dropping by axis 1, i.e, column
xyz.loc[:, ... ].columns
will return a list with the column names resulting from your slicing condition
round(100*(xyz.isnull().sum()/len(xyz.index)), 2)>70
this instruction is counting the number of nulls, adding them up and normalizing by the number of rows, effectively computing the percentage of nan in each column. Then, the amount is rounded to have only 2 decimal positions and finally you return True is the number of nan is more than 70%. Hence, you get a mapping between columns and a True/False array.
Putting everything together: you're first producing a Boolean array that marks which columns have more than 70% nan, then, using .loc you use Boolean indexing to look only at the columns you want to drop ( nan % > 70%), then using .columns you recover the name of such columns, which then are used by the .drop instruction.
Hopefully this clear things up!
If you code is hard to understand , you can just check dropna with thresh, since pandas already cover this case.
df=df.dropna(axis=1,thresh=round(len(df)*0.3))

Pandas giving length must be equal error when trying to replace one column with another

I am trying to fill one column with another column if null. I tried 2 ways
df.NAME1 = np.where(df.NAME1.isnull(), df.NAME2, df.NAME1)
df['NAME1'] = df['NAME1'].fillna(df['NAME2'])
I get:
Lengths must be equal.