Select rows which contain numeric substrings in Pandas

Select rows which contain numeric substrings in Pandas - pandas

I need to delete rows from a dataframe in which a particular column contains string which contains numeric substrings. See the shaded column of my dataframe.
rows with values like 0E as prefix or 21 (any two digit number) as suffix or 24A (any two digit number with a letter) as suffix should be deleted.
Any suggestions?
Thanks in advance.

You can use boolean indexing with a str.contains() regex:
^0E - starts with 0E
\d{2}$ - ends with 2 digits
\d{2}[A-Z]$ - ends with 2 digits and 1 capital letter
col = ... # target column
mask = df[col].str.contains(r'^0E|\d{2}$|\d{2}[A-Z]$')
df = df.loc[~mask]

#tdy gave a good answer, but only one place need to be modified if I understand it correctly.
For value ends with two digits or two digits and a capital character, the regex should be:
.*\d{2}[A-Z]?$

Related

Remove just strings from the entries in my first column of data frame

I have strings and numbers in my first column of a data frame:
rn
AT457
X5377
X3477
I want to remove just the strings and keep the numbers from each entry in the column called rn.
Any help is appreciated.

Use a regular expression to do this.
For example, with R :
## Sample data :
df=data.frame(rn=c("AT457","X5377","X3477"))
## Replace the letters with *nothing* ('\D' is used to identify non-digit characters)
df$rn_strip=gsub('\\D',"",df$rn)
## Output :
rn rn_strip
1 AT457 457
2 X5377 5377
3 X3477 3477

return rows dosn't have specefic number of length in pandas

am clean my dataset and cleaned it but am stuck in some rows don't have the specific length must have in the column
The column (order_id) must have 16 character the column type is object, so i'dont know how i can extract all rows don't have the exact character must be in column and how to remove those rows
Thank You .
for more information
image of column
in excel i can just filter the column and show only value that has 16 character
i want to do that in pandas i want just to return rows that contain 16 character and drop all row greater or lower than 16 character .

I suppose you want to keep all rows which match this pattern [0-9A-F]{16}:
df = df[df['order_id'].str.contains(r'^[0-9A-F]{16}$')]

How to add a character to the last third place of a string?

I have a column with numbers with various lengths such as 50055, 1055,155 etc. How can I add a decimal before the last 2nd place of each so that it would be 500.55, 10.55, and 1.55?
I tried using replace by finding the last 2 numbers and replace it with .||last 2 number. That doesn't always work because of a possibility of multiple repetition of the same sequence in the same string.
replace(round(v_num/2),substr(round(v_num/2),-2),'.'||substr(round(v_num/2),-2))

You would divide by 100:
select v_num / 100
You can convert this into a string, if you want.

How to extract just numeric value with REGEXP_EXTRACT in BigQuery?

I am trying to extract just the numbers from a particular column in BigQuery.
The fields concerned have this format: value = "Livraison_21J|Relais_19J" or "RELAIS_15 DAY"
I am trying to extract the number of days for each value preceeded by the keyword "Relais".
The days range from 1 to 100.
I used this to do so:
SELECT CAST(REGEXP_EXTRACT(delivery, r"RELAIS_([0-9]+J)") as string) as relayDay
FROM TABLE
I want to be able to extract just the number of days regardless of the the string that comes after the numbers, be it "J" or "DAY".
Sample data :
RETRAIT_2H|LIVRAISON_5J|RELAIS_5J | 5J
LIVRAISON_21J|RELAIS_19J | 19J
LIVRAISON_21J|RELAIS_19J | 19J
RETRAIT_2H|LIVRAISON_3J|RELAIS_3J | 3J

You may use
REGEXP_EXTRACT(delivery, r"(?:.*\D)?(\d+)\s*(?:J|DAY)")
See the regex demo
Details
(?:.*\D)? - an optional non-capturing group that matches 0+ chars other than line break chsrs as many as possible and then a non-digit char (this pattern is required to advance the index to the location right before the last sequence of digits, not the last digit)
(\d+) - Group 1 (just what the REGEXP_EXTRACT returns): one or more digits
\s* - 0+ whitespaces
(?:J|DAY) - J or DAY substrings.

Python: Remove exponential in Strings

I have been trying to remove the exponential in a string for the longest time to no avail.
The column involves strings with alphabets in it and also long numbers of more than 24 digits. I tried converting the column to string with .astype(str) but it just reads the line as "1.234123E+23". An example of the table is
A
345223423dd234324
1.234123E+23
how do i get the table to show the full string of digits in pandas?

b = "1.234123E+23"
str(int(float(b)))
output is '123412299999999992791040'
no idea how to do it in pandas with mixed data type in column

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Select rows which contain numeric substrings in Pandas - pandas

You can use boolean indexing with a str.contains() regex: ^0E - starts with 0E \d{2}$ - ends with 2 digits \d{2}[A-Z]$ - ends with 2 digits and 1 capital letter col = ... # target column mask = df[col].str.contains(r'^0E|\d{2}$|\d{2}[A-Z]$') df = df.loc[~mask]

#tdy gave a good answer, but only one place need to be modified if I understand it correctly. For value ends with two digits or two digits and a capital character, the regex should be: .*\d{2}[A-Z]?$

Related

Remove just strings from the entries in my first column of data frame

return rows dosn't have specefic number of length in pandas

How to add a character to the last third place of a string?

How to extract just numeric value with REGEXP_EXTRACT in BigQuery?

Python: Remove exponential in Strings

Categories

Resources