return rows dosn't have specefic number of length in pandas - pandas

am clean my dataset and cleaned it but am stuck in some rows don't have the specific length must have in the column
The column (order_id) must have 16 character the column type is object, so i'dont know how i can extract all rows don't have the exact character must be in column and how to remove those rows
Thank You .
for more information
image of column
in excel i can just filter the column and show only value that has 16 character
i want to do that in pandas i want just to return rows that contain 16 character and drop all row greater or lower than 16 character .

I suppose you want to keep all rows which match this pattern [0-9A-F]{16}:
df = df[df['order_id'].str.contains(r'^[0-9A-F]{16}$')]

Related

Select rows which contain numeric substrings in Pandas

I need to delete rows from a dataframe in which a particular column contains string which contains numeric substrings. See the shaded column of my dataframe.
rows with values like 0E as prefix or 21 (any two digit number) as suffix or 24A (any two digit number with a letter) as suffix should be deleted.
Any suggestions?
Thanks in advance.
You can use boolean indexing with a str.contains() regex:
^0E - starts with 0E
\d{2}$ - ends with 2 digits
\d{2}[A-Z]$ - ends with 2 digits and 1 capital letter
col = ... # target column
mask = df[col].str.contains(r'^0E|\d{2}$|\d{2}[A-Z]$')
df = df.loc[~mask]
#tdy gave a good answer, but only one place need to be modified if I understand it correctly.
For value ends with two digits or two digits and a capital character, the regex should be:
.*\d{2}[A-Z]?$

filling nan values with two different values

Some values in theenter code here FlightNumber column are missing. These numbers are meant to increase by 10 with each row so 10055 and 10075 need to be put in place. Fill in these missing numbers and make the column an integer column (instead of a float column). Some values in the FlightNumber column are missing. These numbers are meant to increase by 10 with each row so 10055 and 10075 need to be put in place. Fill in these missing numbers and make the column an integer column (instead of a float column).
I tried this but not getting the correct result.
df['FlightNumber'].fillna(10055, inplace = True)
df['FlightNumber'].fillna(10075, inplace = True)
df[['FlightNumber']] = df[['FlightNumber']].astype(int)
df
df.loc[df['FlightNumber'].isna(),'FlightNumber'] = df.loc[df[df['FlightNumber'].isna()].index-1,'FlightNumber'].astype(int)+10

Python: Remove exponential in Strings

I have been trying to remove the exponential in a string for the longest time to no avail.
The column involves strings with alphabets in it and also long numbers of more than 24 digits. I tried converting the column to string with .astype(str) but it just reads the line as "1.234123E+23". An example of the table is
A
345223423dd234324
1.234123E+23
how do i get the table to show the full string of digits in pandas?
b = "1.234123E+23"
str(int(float(b)))
output is '123412299999999992791040'
no idea how to do it in pandas with mixed data type in column

Pandas dataframe selection df['a'][50][:51]

I have a dataframe where one of the column name is 'a'
I came across a following selection expression
dataframe['a'][50][:50]
I understand dataframe['a'][50] selects the row 49 in column ['a'], but what does [:50] do?
Thank you
If dataframe['a'][50][:50] doesn't error out and it actually returns something, it means the row 49 in column ['a'] contains iterables(more precisely sequence types) such as list, string, tuple...
dataframe['a'][50][:50] returns the sequence from element 0 to 49 from the value of the row 49 in column ['a'].
As I said above, if the row 49 in column ['a'] doesn't contain a sequence type, you will get errors. Try check dataframe['a'][50] to see if it is a sequence type
Note: dataframe['a'][50] is chain-indexing. It is not recommended. However, it is out of the scope of this question so I don't go into the detail of it.

Delete rows based on char in the index string

I have the following dataframe:
df = pd.DataFrame(np.random.randn(4, 1), index=['mark13', 'luisgimenez', 'miguel72', 'luis34'],columns=['probability'])
probability
mark13 -1.054687
luisgimenez 0.081224
miguel72 -0.893619
luis34 -1.576941
I would like to remove the rows where the last character in the index string does not contain a number .
The desired output would look something like this :
(dropping the row where the index does not finishes with a number)
probability
mark13 -1.054687
miguel72 -0.893619
luis34 -1.576941
I am sure the direction I need to get is the boolean indexing but I do not know how could I reference the last character in the index name
#use isdigt to check last char of your index to be used as a mask array to filter rows.
df[[e[-1].isdigit() for e in df.index]]
Out[496]:
probability
mark13 -0.111338
miguel72 0.548725
luis34 0.682949
You can use the str accessor to check if the last character is a number:
df[df.index.str[-1].str.isdigit()]
Out:
probability
mark13 -0.350466
miguel72 1.220434
luis34 -0.962123