Try to check if column consists out of 3 numbers, and change the value to the first number of the column - sql

I'm trying to create a new column called 'team'. In the image below you see different type of codes. The first number of the code is the team someone's in, IF the number consists out of 3 characters. E.G: 315 = team 3, 240 = team 2, and 3300 = NULL.
In the image below you can see my data flow so far and the expression I have tried, but doesn't work.

You forget parenthesis () in your regex :
Try :
^([0-9]{3})$
Demo

Related

Pandas Replace_ column values

Hello,
I am analyzing the next dataset with this information .
The column ['program_number'] is an object but I want to change it to a integer colum.
I have tried to replace some values but it doesn´t work.
as you can see, some values like 6 is duplicate. like '6 ' and 6.
How can I resolve it? Many thanks
UPDATE
Didn't see 1X and 3X at first.
If you need those numbers and just want to remove the X then:
df["Program"] = df["Program"].str.strip(" X").astype(int)
If there is data in the column which aren't numbers or which shouldn't be converted, you can use pd.to_numeric with errors='corece'. If there are cells which can't be converted, you'll get NaN. Be aware that this will result in floating numbers.
df["Program"] = pd.to_numeric(df["Program"], errors="coerce")
old
You want to use str.strip() here, rather than replace.
Try this:
df1['program_number'] = df1['program_number'].str.strip().astype(int)

Find a column that contains a specific string from another column

I have 2 data frames. One called cuartos (rooms in English) and another called paredes (walls in English) They have room temperatures and walls temperatures. I want to create a new data frame with the temperature difference between each wall and its room. For example
Room name = 2_APTO_1
Walls of the room = 2_APTO_1.FACE2, 2_APTO_1.FACE3 and 2_APTO_1.FACE4
The new data frame should be something like
2_APTO_1.FACE2 = 2_APTO_1.FACE2 - 2_APTO_1
2_APTO_1.FACE3 = 2_APTO_1.FACE3 - 2_APTO_1
2_APTO_1.FACE4 = 2_APTO_1.FACE4 - 2_APTO_1 ....
I tried this:
get a list of paredes and cuartos columns
col_names_paredes= paredes.columns.tolist()
col_names_cuartos= cuartos.columns.tolist()
Check if col_names_paredes has col_names_cuartos names in it
for i in col_names_cuartos:
for k in col_names_paredes:
if col_names_paredes[k] in col_names_cuartos[i]:
print(k)
I got this error
TypeError: list indices must be integers or slices, not str
any help would be appreciated.
When you do for i in col_names_cuartos, i will take column names values, not indice values that you would obtain with for i in range(len(col_names_cuartos)).
So you can use the following code instead :
for col_cuartos in col_names_cuartos:
for col_paredes in col_names_paredes:
if col_paredes in col_cuartos:
print(col_paredes)

CR | Copy data to another row using a formula field or variable

Here is my problem:
Raw data 1
If there is a position 105 and 150, I need the material number of position 150. If there is only position 105, I need the material number of position 105.
On the right side of the picture you can see the correct selected material number.
Now I need to assign this data to position 100 (bc I will use a counter later on, which is depending on position 100).
Here you can see more of the raw data of the report (I can´t insert the complete report here, I use the details area only for testing).
I marked one "group" in which you can see why I can´t change the order of the positions. In this case I need to use position 105 to output the material number (number rightmost on the red border) because there is no position 150.
Raw data 2
Here is another example with position 150 used for the material number (the correct material number will be placed on position 105 every time):
Raw data 3
To use this material number in my following tables, it need to be assigned to position 100.
Thanks!

Replacing substrings based on lists

I am trying to replace substrings in a data frame by the lists "name" and "lemma". As long as I enter the lists manually, the code delivers the result in the dataframe m.
name=['Charge','charge','Prepaid']
lemma=['Hallo','hallo','Hi']
m=sdf.replace(regex= name, value =lemma)
As soon as I am reading in both lists from an excel file, my code is not replacing the substrings anymore. I need to use an excel file, since the lists are in one table that is very large.
sdf= pd.read_excel('training_data.xlsx')
synonyms= pd.read_excel('synonyms.xlsx')
lemma=synonyms['lemma'].tolist()
name=synonyms['name'].tolist()
m=sdf.replace(regex= name, value =lemma)
Thanks for your help!
df.replace()
Replace values given in to_replace with value.
Values of the DataFrame are replaced with other values dynamically. This differs from updating with .loc or .iloc, which require you to specify a location to update with some value.
in short, this method won't make change on the series level, only on values.
This may achieve what you want:
sdf.regex = synonyms.name
sdf.value = synonyms.lemma
If you are just trying to replace 'Charge' with 'Hallo' and 'charge' with 'hallo' and 'Prepaid' with 'Hi' then you can use repalce() and pass the list of words to finds as the first argument and the list of words to replace with as the second keyword argument value.
Try this:
df=df.replace(name, value=lemma)
Example:
name=['Charge','charge','Prepaid']
lemma=['Hallo','hallo','Hi']
df = pd.DataFrame([['Bob', 'Charge', 'E333', 'B442'],
['Karen', 'V434', 'Prepaid', 'B442'],
['Jill', 'V434', 'E333', 'charge'],
['Hank', 'Charge', 'E333', 'B442']],
columns=['Name', 'ID_First', 'ID_Second', 'ID_Third'])
df=df.replace(name, value=lemma)
print(df)
Output:
Name ID_First ID_Second ID_Third
0 Bob Hallo E333 B442
1 Karen V434 Hi B442
2 Jill V434 E333 hallo
3 Hank Hallo E333 B442

Dataframe non-null values differ from value_counts() values

There is an inconsistency with dataframes that I cant explain. In the following, I'm not looking for a workaround (already found one) but an explanation of what is going on under the hood and how it explains the output.
One of my colleagues which I talked into using python and pandas, has a dataframe "data" with 12,000 rows.
"data" has a column "length" that contains numbers from 0 to 20. she wants to divided the dateframe into groups by length range: 0 to 9 in group 1, 9 to 14 in group 2, 15 and more in group 3. her solution was to add another column, "group", and fill it with the appropriate values. she wrote the following code:
data['group'] = np.nan
mask = data['length'] < 10;
data['group'][mask] = 1;
mask2 = (data['length'] > 9) & (data['phraseLength'] < 15);
data['group'][mask2] = 2;
mask3 = data['length'] > 14;
data['group'][mask3] = 3;
This code is not good, of course. the reason it is not good is because you dont know in run time whether data['group'][mask3], for example, will be a view and thus actually change the dataframe, or it will be a copy and thus the dataframe would remain unchanged. It took me quit sometime to explain it to her, since she argued correctly that she is doing an assignment, not a selection, so the operation should always return a view.
But that was not the strange part. the part the even I couldn't understand is this:
After performing this set of operation, we verified that the assignment took place in two different ways:
By typing data in the console and examining the dataframe summary. It told us we had a few thousand of null values. The number of null values was the same as the size of mask3 so we assumed the last assignment was made on a copy and not on a view.
By typing data.group.value_counts(). That returned 3 values: 1,2 and 3 (surprise) we then typed data.group.value_counts.sum() and it summed up to 12,000!
So by method 2, the group column contained no null values and all the values we wanted it to have. But by method 1 - it didnt!
Can anyone explain this?
see docs here.
You dont' want to set values this way for exactly the reason you pointed; since you don't know if its a view, you don't know that you are actually changing the data. 0.13 will raise/warn that you are attempting to do this, but easiest/best to just access like:
data.loc[mask3,'group'] = 3
which will guarantee you inplace setitem