Python: Remove exponential in Strings - pandas

I have been trying to remove the exponential in a string for the longest time to no avail.
The column involves strings with alphabets in it and also long numbers of more than 24 digits. I tried converting the column to string with .astype(str) but it just reads the line as "1.234123E+23". An example of the table is
A
345223423dd234324
1.234123E+23
how do i get the table to show the full string of digits in pandas?

b = "1.234123E+23"
str(int(float(b)))
output is '123412299999999992791040'
no idea how to do it in pandas with mixed data type in column

Related

return rows dosn't have specefic number of length in pandas

am clean my dataset and cleaned it but am stuck in some rows don't have the specific length must have in the column
The column (order_id) must have 16 character the column type is object, so i'dont know how i can extract all rows don't have the exact character must be in column and how to remove those rows
Thank You .
for more information
image of column
in excel i can just filter the column and show only value that has 16 character
i want to do that in pandas i want just to return rows that contain 16 character and drop all row greater or lower than 16 character .
I suppose you want to keep all rows which match this pattern [0-9A-F]{16}:
df = df[df['order_id'].str.contains(r'^[0-9A-F]{16}$')]

Convert float64 to a string with thousand separators

I have a Population Estimate series with numbers as float64 and I need to convert them to a string with thousands separator (using commas). Using all significant digits (no rounding).
e.g. 12345678.90345 -> 12,345,678.90345
Try applying a comma-float string formatter.
population = population.apply('{:,.5f}'.format)
To achieve the desired formatting, you could use '{:,}'.format.
This will use commas as thousands separator and only output the values that are in your data and not clip or fill to a certain number of digits.
data = data.apply('{:,}'.format)

Select rows which contain numeric substrings in Pandas

I need to delete rows from a dataframe in which a particular column contains string which contains numeric substrings. See the shaded column of my dataframe.
rows with values like 0E as prefix or 21 (any two digit number) as suffix or 24A (any two digit number with a letter) as suffix should be deleted.
Any suggestions?
Thanks in advance.
You can use boolean indexing with a str.contains() regex:
^0E - starts with 0E
\d{2}$ - ends with 2 digits
\d{2}[A-Z]$ - ends with 2 digits and 1 capital letter
col = ... # target column
mask = df[col].str.contains(r'^0E|\d{2}$|\d{2}[A-Z]$')
df = df.loc[~mask]
#tdy gave a good answer, but only one place need to be modified if I understand it correctly.
For value ends with two digits or two digits and a capital character, the regex should be:
.*\d{2}[A-Z]?$

Using Pyspark to convert column from string to timestamp

I have pyspark dataframe with 2 columns (Violation_Time, Time_First_Observed) which are captured as strings. Sample of data is below, where there it is captured as HHmm with "A" or "P" representing am or pm. Also, the data has error where some entries exceed 24HH.
Violation_Time Time_First_Observed
0830A 1600P
1450P 0720A
1630P 2540P
0900A 0100A
I would like to use pyspark to remove the "A" and "P" for both columns and subsequently convert the data (e.g., 0800, 1930 etc) into a timestamp for analysis purposes. I have tried to do this for the "Violation_Time" column and create a new column "timestamp" to store this (see code below). However, I can't seem to be able to do it. Any form of help is appreciate, thank you.
sparkdf3.withColumn('timestamp',F.to_timestamp("Violation_Time", "HH"))
sparkdf3.select(['Violation_Time','timestamp']).show()
You can use the following
sparkdf3 = sparkdf3.withColumn('timestamp', func.split(func.to_timestamp('Violation_Time', 'HHmm'), ' ').getItem(1))
sparkdf3.select(['Violation_Time','timestamp']).show()
Explanation
sparkdf3.withColumn('timestamp',
func.split(
func.to_timestamp('Violation_Time', 'HHmm') #Convert to timestamp. It will convert in datetime format
, ' '
).getItem(1) #Split on space and get first item
)

concatenate values of string column and long column in panda dataframe

I have a pandas data frame which doesn't have an index yet (just artificial 1,2,3,.. index)
Column 'store', 'style' is string, column 'color', 'size' is a long int.
None of them are unique by themselves, but the concatenation of them are unique.
I want to concatenate them to produce an index, but
df2['store']+df2['style']+str(df2['color'])+str(df2['size'])
or
df2['store']+df2['style']+df2['color'].to_string()+df2['size'].to_string()
both doesn't work. I think it takes the whole column, force it to become a string and concatenate which results in weird symbols. And merges doesn't work correctly.
What's the correct way to concatenate a string column and a long column?
This should be:
df2['store'] + df2['style'] + df2['color'].astype(str) + df2['size'].astype(str)
Explanation: str(df2['size']) will make a string representation of the full column (one string, comparable as to what you see if you print the string), while .astype(str) will convert all values of the series to strings.
to_string gives the same result as str() (but takes optional parameters to control the result)