I'm using an update statement to replace any Null or empty string values with "No Cost Center". The data that I have imported into the table has numerous blank/empty string values and when I run the code below, it affects 0 rows:
Update [dbo].[Import_tbl_Inventory] Set [dbo].[Import_tbl_Inventory].[User Defined Label 4] = 'No Cost Center'
Where [dbo].[Import_tbl_Inventory].[User Defined Label 4] is Null or [dbo].[Import_tbl_Inventory].[User Defined Label 4] = ''
Is there something other than NULL and Empty String values that I need to be checking for?
EDIT:
Upon further review, I decided to right click the table and select Edit top Rows. Here I discovered that each row in the User Defined Label 4 column actually contains spaces. I was able to delete the spaces out of the first 2 rows manually, but any rows after that give me a message saying Data in row was not committed. The row values updated or deleted either do not make to the row unique or they alter multiple rows
I'm only altering one row at a time, and there is no reason this should have anything to do with making the row unique. Now sure what's going on here.
A "blank" string could be a nonempty string of whitespace characters, which might not be easy to distinguish visually from an empty string. Depending on how you view it, maybe not from NULL, either. To include rows having such values in your update, you can trim them before comparing with the empty string:
Update [dbo].[Import_tbl_Inventory]
Set [dbo].[Import_tbl_Inventory].[User Defined Label 4] = 'No Cost Center'
Where [dbo].[Import_tbl_Inventory].[User Defined Label 4] is Null
or RTRIM([dbo].[Import_tbl_Inventory].[User Defined Label 4]) = ''
You could use LTRIM() instead of you prefer, but you don't need both because if the string contains only whitespace then passing it to either LTRIM() or RTRIM() will yield an empty string.
You need to check for unprintable characters like tab and line feed
you can try LTRIM and RTRIM in your Column and Check the LEN
Update [dbo].[Import_tbl_Inventory] Set
[dbo].[Import_tbl_Inventory].[User Defined Label 4] = 'No Cost Center'
Where
LEN(RTRIM(LTRIM(ISNULL([dbo].[Import_tbl_Inventory].[User Defined Label 4],''))))=0
You can use REPLACE
Update [dbo].[Import_tbl_Inventory] Set
[dbo].[Import_tbl_Inventory].[User Defined Label 4] = 'No Cost Center'
Where
LEN(REPLACE(ISNULL( [dbo].[Import_tbl_Inventory].[User Defined Label 4],''),' ',''))=0
A little cleaner without or
Update [dbo].[Import_tbl_Inventory] Set [dbo].[Import_tbl_Inventory].[User Defined Label 4] = 'No Cost Center'
Where len(isNull([dbo].[Import_tbl_Inventory].[User Defined Label 4], '')) = 0
Related
Hello,
I am analyzing the next dataset with this information .
The column ['program_number'] is an object but I want to change it to a integer colum.
I have tried to replace some values but it doesn´t work.
as you can see, some values like 6 is duplicate. like '6 ' and 6.
How can I resolve it? Many thanks
UPDATE
Didn't see 1X and 3X at first.
If you need those numbers and just want to remove the X then:
df["Program"] = df["Program"].str.strip(" X").astype(int)
If there is data in the column which aren't numbers or which shouldn't be converted, you can use pd.to_numeric with errors='corece'. If there are cells which can't be converted, you'll get NaN. Be aware that this will result in floating numbers.
df["Program"] = pd.to_numeric(df["Program"], errors="coerce")
old
You want to use str.strip() here, rather than replace.
Try this:
df1['program_number'] = df1['program_number'].str.strip().astype(int)
I now have some data, it's may contain null values
I want to delete it's null value (a whole row or a whole column)
How can I deal with the comparison?
Here is my data
https://reurl.cc/5lONv6
it will have some null values in the time series data
following is my code
c=pd.read_csv('./in/historical_01A190.txt',error_bad_lines=False)
c.dropna(axis=0,how='any',inplace=True)
c.dropna(axis=1,how='any',inplace=True)
c.to_csv('./out/historical_01A190.txt',index=False)
but it's didn't work
anyone can help me?
Okay, first of all, your data isn't saved as a csv. It's saved as a tab-separated file.
So you need to open it using pd.read_table
>>> c=pd.read_table('./data.txt',error_bad_lines=False,sep='\t')
Second, your data is full of nans -- if you use dropna on either rows or columns, you end up with just one row or column (dates) left. But using the correct opener on your file, the dropna and to_csv functions work.
If you don't assing the variable then it will only create a view which is not stored in memory.
c = c.dropna(axis=0,how='any',inplace=True)
c = c.dropna(axis=1,how='any',inplace=True)
c = c.to_csv('./out/historical_01A190.txt',index=False)
Try this.
I am trying to create a new column based on selection criteria in another column. This is at an end of a while loop so the data frame does not have the column until this part of the first iteration. All subsequent iterations will be based on this columns previous iteration's total and the current totals:
if 'cBeds' in sPhase.columns:
sPhase['cBeds'] = np.where(sPhase['COUNTYFP'] == '1', (sPhase['cBeds'] + (sPhase[infCount] * .08)), sPhase['cBeds'])
else:
sPhase['cBeds'] = np.where(sPhase['COUNTYFP'] == '1', (sPhase[infCount] * .08), sPhase['cBeds'])
However, when I run the code I get 'KeyError: 'cBeds'
How can handle updating a column in a conditional when the column doesn't exist on the first iteration?
In the else clause, you reference sPhase['cbeds'] as the third parameter to np.where even though you've already established that the column does not exist.
If you want to avoid this problem, just add the column at the beginning of the loop and give it a default value that you can conditionally change later.
I have a dataframe where one of the column name is 'a'
I came across a following selection expression
dataframe['a'][50][:50]
I understand dataframe['a'][50] selects the row 49 in column ['a'], but what does [:50] do?
Thank you
If dataframe['a'][50][:50] doesn't error out and it actually returns something, it means the row 49 in column ['a'] contains iterables(more precisely sequence types) such as list, string, tuple...
dataframe['a'][50][:50] returns the sequence from element 0 to 49 from the value of the row 49 in column ['a'].
As I said above, if the row 49 in column ['a'] doesn't contain a sequence type, you will get errors. Try check dataframe['a'][50] to see if it is a sequence type
Note: dataframe['a'][50] is chain-indexing. It is not recommended. However, it is out of the scope of this question so I don't go into the detail of it.
I am trying to replace substrings in a data frame by the lists "name" and "lemma". As long as I enter the lists manually, the code delivers the result in the dataframe m.
name=['Charge','charge','Prepaid']
lemma=['Hallo','hallo','Hi']
m=sdf.replace(regex= name, value =lemma)
As soon as I am reading in both lists from an excel file, my code is not replacing the substrings anymore. I need to use an excel file, since the lists are in one table that is very large.
sdf= pd.read_excel('training_data.xlsx')
synonyms= pd.read_excel('synonyms.xlsx')
lemma=synonyms['lemma'].tolist()
name=synonyms['name'].tolist()
m=sdf.replace(regex= name, value =lemma)
Thanks for your help!
df.replace()
Replace values given in to_replace with value.
Values of the DataFrame are replaced with other values dynamically. This differs from updating with .loc or .iloc, which require you to specify a location to update with some value.
in short, this method won't make change on the series level, only on values.
This may achieve what you want:
sdf.regex = synonyms.name
sdf.value = synonyms.lemma
If you are just trying to replace 'Charge' with 'Hallo' and 'charge' with 'hallo' and 'Prepaid' with 'Hi' then you can use repalce() and pass the list of words to finds as the first argument and the list of words to replace with as the second keyword argument value.
Try this:
df=df.replace(name, value=lemma)
Example:
name=['Charge','charge','Prepaid']
lemma=['Hallo','hallo','Hi']
df = pd.DataFrame([['Bob', 'Charge', 'E333', 'B442'],
['Karen', 'V434', 'Prepaid', 'B442'],
['Jill', 'V434', 'E333', 'charge'],
['Hank', 'Charge', 'E333', 'B442']],
columns=['Name', 'ID_First', 'ID_Second', 'ID_Third'])
df=df.replace(name, value=lemma)
print(df)
Output:
Name ID_First ID_Second ID_Third
0 Bob Hallo E333 B442
1 Karen V434 Hi B442
2 Jill V434 E333 hallo
3 Hank Hallo E333 B442