replace value under column in pandas under condition - pandas

I want to replace values under column Severity with following values.
4:out for season
3:out indefinitely
2:DNP
1:DTD
but .replace doesn't seem to work. Any other ways to get around this
enter image description here

I think you have things reversed in your replacement code.
Try this out and report back what happens:
replace_dict = {'DTD':1,'DNP':2,'out indefinitely':3,'out for season':4}
df = pd.DataFrame({'severity':'out for season'},index=[0])
df['severity'] = df['severity'].replace(replace_dict)

Related

Pandas Replace_ column values

Hello,
I am analyzing the next dataset with this information .
The column ['program_number'] is an object but I want to change it to a integer colum.
I have tried to replace some values but it doesn´t work.
as you can see, some values like 6 is duplicate. like '6 ' and 6.
How can I resolve it? Many thanks
UPDATE
Didn't see 1X and 3X at first.
If you need those numbers and just want to remove the X then:
df["Program"] = df["Program"].str.strip(" X").astype(int)
If there is data in the column which aren't numbers or which shouldn't be converted, you can use pd.to_numeric with errors='corece'. If there are cells which can't be converted, you'll get NaN. Be aware that this will result in floating numbers.
df["Program"] = pd.to_numeric(df["Program"], errors="coerce")
old
You want to use str.strip() here, rather than replace.
Try this:
df1['program_number'] = df1['program_number'].str.strip().astype(int)

Pandas str split. Can I skip line which gives troubles?

I have a dataframe (all5) including one column with dates('CREATIE_DATUM'). Sometimes the notation is 01/JAN/2015 sometimes it's written as 01-JAN-15.
I only need the year, so I wrote the following code line:
all5[['Day','Month','Year']]=all5['CREATIE_DATUM'].str.split('-/',expand=True)
but I get the following error:
columns must be same length as key
so I assume somewhere in my dataframe (>100.000 lines) a value has more than two '/' signs.
How can I make my code skip this line?
You can try to use pd.to_datetime and then use .dt property to access day, month and year:
x = pd.to_datetime(all5["CREATIE_DATUM"])
all5["Day"] = x.dt.day
all5["Month"] = x.dt.month
all5["Year"] = x.dt.year

Pandas.dropna method can't delete Nan value rows(or columns)

I now have some data, it's may contain null values
I want to delete it's null value (a whole row or a whole column)
How can I deal with the comparison?
Here is my data
https://reurl.cc/5lONv6
it will have some null values ​​in the time series data
following is my code
c=pd.read_csv('./in/historical_01A190.txt',error_bad_lines=False)
c.dropna(axis=0,how='any',inplace=True)
c.dropna(axis=1,how='any',inplace=True)
c.to_csv('./out/historical_01A190.txt',index=False)
but it's didn't work
anyone can help me?
Okay, first of all, your data isn't saved as a csv. It's saved as a tab-separated file.
So you need to open it using pd.read_table
>>> c=pd.read_table('./data.txt',error_bad_lines=False,sep='\t')
Second, your data is full of nans -- if you use dropna on either rows or columns, you end up with just one row or column (dates) left. But using the correct opener on your file, the dropna and to_csv functions work.
If you don't assing the variable then it will only create a view which is not stored in memory.
c = c.dropna(axis=0,how='any',inplace=True)
c = c.dropna(axis=1,how='any',inplace=True)
c = c.to_csv('./out/historical_01A190.txt',index=False)
Try this.

Length of value issue with unique ids

I am trying to write a simple code and haven't found a simple answer for this. I am trying to assign a unique ID to each person based on when the file was amended and their employee ID. Then add the column of Unique IDs to the file.
excel1 = "Book1.xlsx"
df1 = pd.read_excel(excel1, header = 0)
time = time.strftime('%m%d%Y%H%m', time.gmtime(os.path.getmtime ("Book1.xlsx")))
unique_id=[df1["ID"] + time]
df1["CID"]=unique_id
When I try to run it I keep getting an error of
ValueError: Length of values does not match length of index
Could anyone have an answer on this?

What is the cleanest way to create a new column based on a conditional of an existing column?

In pandas I currently have a data frame containing a column of strings: {Urban, Suburban, Rural}. The column I would like to create is conditional of the first column (i.e. Urban, Suburban, Rural are associated with the corresponding colors) {Coral, Skyblue, Gold}
I tried copying the first column and then using .replace but my new column seems to return NaN values now instead of the colors.
new_column = merge_table["type"]
merge_table["color"] = new_column
color_df = merge_table["color"].replace({'Urban': 'Coral', 'Suburban': 'Skyblue', 'Rural': 'Gold'})
data = pd.DataFrame({'City Type': type,
'Bubble Color': color_df
})
data.head()
You can do
merge_table['New col']=merge_table["color"].replace({'Urban': 'Coral', 'Suburban': 'Skyblue', 'Rural': 'Gold'})
Okay. in the future, its worth typing the codes using 'Code Samples' so that we can view your code easier.
Lots of areas can improve your code. Firstly you do the entire thing in one line:
merge_table["color"] = merge_table["type"].map(mapping_dictionary)
Series.map() is around 4 times faster than Series.replace() for your information.
also other tips:
never use type as a variable name, use something more specific like city_type. type is already a standard built-in method
data = pd.DataFrame({'City Type': city_type, 'Bubble Color': color_df})
if make a copy of a column, use:
a_series = df['column_name'].copy()